[Nagios-users] Antwort: How to not send out first service notifications?
[EMAIL PROTECTED] schrieb am 19.10.2006 11:04:19: I am monitoring some hosts on the Internet for informational reasons. Since these hosts quite frequently have failed services, I'd like my Nagios to refrain from notifying me if a service is down at the first notification. Subsequent notifications, however, should be sent out. Is there a way to do this any easier than having no notifications set in the service definition and have a service escalation having the list of contacts that used to be in the service definition? If you always only want the 2nd notification, then your approach sounds wrong. You shouldn't suppress the first notification always, but instead maybe raise the number of consecutive failed checks until you throw a hard state, so you do not get too many false warnings. regards Sascha -- Sascha Runschke Netzwerk Management IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Antwort: Service escalation for service groups?
[EMAIL PROTECTED] schrieb am 19.10.2006 11:08:58: Hi, in the Nagios 2.x docs, a serviceescalation item can be configured for a host name and a service description. Is there any possibility to define escalation items that automatically apply for all members of a service group? Example: define serviceescalation { servicegroup_name WUT-SERVICEGROUP first_notification 1 last_notification 0 contact_groups HOST-CONTACTGROUP-SMS,HOST-CONTACTGROUP-MAIL notification_interval 10 escalation_period 24x7 escalation_options w,c,r } regards Sascha -- Sascha Runschke Netzwerk Management IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Antwort: UTF8 Japanese characters in macros
[EMAIL PROTECTED] schrieb am 18.10.2006 04:26:19: /usr/bin/printf %b * Nagios *\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\n$HOSTNOTES$\n\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n | /usr/bin/mail -s Host $HOSTSTATE$ alert for $HOSTNAME$! $CONTACTEMAIL$ [SNIP] Both printf and mail work fine with UTF8 japanese, and tests using only the $HOSTNOTES$ macro yeilds similar results (nothing output). The CGI's work fine, and display the japanese, so it seems the macro processing is not able to handle the characters. Has anyone had similar problems, and is there a solution? I can't try the japanese UTF, but are you sure that /usr/bin/printf supports UTF8? Typing printf at the shell is not the same as using /usr/bin/printf. The bash built-in printf function is very different from the external executable and has caused lots of headaches to me already. It might be the source of your problem too. Sascha PS: This problem is one of the reasons I strongly discourage the current standard of how default notifications are generated... -- Sascha Runschke Netzwerk Management IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Antwort: Re: security suid/sudo plugins
[EMAIL PROTECTED] schrieb am 02.09.2006 18:06:47: To make things clearer, the setup I'm proposing is this: 1. # /usr/local/sbin/visudo ... nagios ALL=(ALL) NOPASSWD: /usr/local/nagios/libexec/check_logfiles -f /usr/local/nagios/etc/check_logfiles.cfg 2. # vi /usr/local/nagios/etc/nrpe.cfg ... command[check_logfiles]=/usr/local/bin/sudo /usr/local/nagios/libexec/check_logfiles -f /usr/local/nagios/etc/check_logfiles.cfg 3. # grep nagios /etc/passwd nagios:x:1123:100:Nagios Remote User:/usr/local/nagios:/usr/bin/bash Note to Hari: my understanding is that sudo won't work for account that doesn't have a valid shell. Certainly all my testing led me to that conclusion. 4. # passwd -l nagios It's not clear to me exactly what the security risk is. The idea is that someone may gain access to an unprivileged account on the system and then use this access and this Nagios plugin to cause mailicious damage? Or to break the root account? In which case, it would all come down to how secure the code of the plugin is. Is this correct? Looks ok so far, you just have to make sure of one BIG issue. /usr/local/nagios/libexec/check_logfiles MUST NOT be owned by the nagios user/group and the nagios user/group MUST NOT have write permissions. Imagine someone doing: copy /usr/bin/bash /usr/local/nagios/libexec/check_logfiles In regard to security of the plugin code itself, you're more or less on the safe side here. Since you hardcoded the parameters of the root call, you cannot suffer from buffer overflows caused my malicious parameters and exploiting the plugin via the logfiles itself is both most unlikely and secondly would mean someone already compromised the system - else he couldn't forge syslog entries ;) regards Sascha -- Sascha Runschke Netzwerk Management IT-Services ABIT AG i. Gr. Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Antwort: Re: Recovery not getting sent during downtime?
[EMAIL PROTECTED] schrieb am 31.07.2006 19:42:08: Pardon me. But what is the problem? You have a problem. It triggers an alert. You act and 'fix' it by scheduling downtime. Then you bring the system alive within the allocated maintenance window. I'm sorry, but I got a hard time to pardon such rude answers. It seems there is little point in sending a status change if you declare a maintenance window. Then it is obviously a planned action and there is no need to send out any alert. There's a big point in sending RECOVERY even in scheduled downtimes. I already mentioned it, but I don't mind explaining it again to you. If a service failed before a downtime, it needs to send out recovery notifications even when in downtime. Take my example: service goes critical, every admin gets notified, host goes into downtime, host reboots, service goes ok after reboot, no notifications get sent because the host is still in downtime. Every admin that is currently offsite can't know about the reboot (cause it happened in downtime and noone noticed despite the one who rebooted) and will never get notified about the recovery. That poses a big problem for bigger companies who have numerous admins getting notifications from nagios. If such a case occurs I usually get a call from my CTO why problem XY wasn't fixed yet and I have to tell him: I fixed it with a reboot, but nagios didn't send out the recovery because of the downtime. It wastes time and ressources. Time and ressources mean money. Wasted money is bad. sincerely Sascha -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Antwort: Re: Antwort: Re: Recovery not getting sent duringdowntime?
[EMAIL PROTECTED] schrieb am 02.08.2006 18:04:06: The problem is that it sounds like you're using scheduled downtimes incorrectly. It's not meant to be used for *un*scheduled downtimes; thus the name. It's meant to supress alerts from a machine during the specified window, and that's exactly what it's doing in your case. It is a scheduled downtime. I put the host into downtime, because I was planning to reboot it. I do not want notifications to be sent out for the reboot of course, so I am forced to set a downtime. I can tell you that I'd really annoyed if it didn't work as advertised, and *did* send alerts in the middle of the night when I was working on a box and someone else was carrying the pager (well, I might not be the guy to get annoyed, but I'd hear about it in the morning). I'm not talking about sending alerts. I'm talking about sending recoveries for alerts that happened _before_ the downtime, not for suppressed alerts _during_ the downtime. For those a recovery should never be send of course - as no alert has been sent. It sounds like you should probably be acknowledging critical services, rather than marking them as being in scheduled downtime when they're not. That way the alerts stop until the service comes back up, and you'll be notified when it changes state. If I acknowledge the problem, everyone get's a notification too. Where's the benefit? And acknowledging the problem doesn't make any difference. Service goes critical SMS gets dispatched Problem gets acknowledged SMS gets dispatched Host gets scheduled for downtime Reboot Host/Service OK Still no notification for the rest of the admins that the service is fixed. For them it looks like it's acknowledged and I'm working on it - but no sign ever that I fixed it. Problem persists and still no progress for me as the outcome stays the same: Noone gets notified that the problem was fixed and people will call if I am still working to fix the problem. I still say: for every WARNING/CRITICAL/UNKNOWN that has be sent there must follow a RECOVERY in case the Service/Host recovers. That's expected behaviour and everything else is rather diffuse behaviour in my opinion. (unless I explicitly suppress all notifications for the service/host in question - but downtime shouldn't work this way) regards Sascha -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Antwort: Recovery not getting sent duringdowntime?
[EMAIL PROTECTED] schrieb am 02.08.2006 18:36:18: I think you might want to rethink your process so that it matches the paradigm of the tool you are using. It already can do what you want if you just work with the way it was designed, rather than forcing a process that breaks the paradigm. I think you might want to re-read my mail, it seems you did not fully understand what I meant ;) Service goes critical This is not a scheduled event. It is unscheduled. I never said anything else. I never said I scheduled a downtime for the SERVICE. I said I scheduled a downtime for the HOST, because I was forced to reboot the HOST to fix the SERVICE problem. I didn't have the possibility of acknowledging the service, fix it and be happy. Sometimes certain services of some retarded OS tend to kastrate themselves and only a reboot can fix it. If I do not schedule a HOST downtime, then SMSs get dispatched for the HOST being down and going up and then for the service recovering. Not exactly the behaviour I'd like to see. Since the unscheduled event has already been acknowledged and everyone who might want to jump in to help already knows it is being handled, there is no need to schedule a downtime for an unscheduled event. Just acknowledge it. Reboot Host/Service OK Recovery Note goes out to everyone, including CTO. Problem Solved. Everyone knows what is happening. Your performance evaluation gets a boost for being the one to solve the problem. Uhm and what about the 2 SMS going out to everyone stating the host going down and up again? That is neither expected (by the rest of the admins for example), nor wanted behaviour. I DO know what the documentation says regarding scheduled downtimes. I DO know it says it suppresses all alerts. Yet I still say it should not suppress recoveries for critical/warning/ unknown which happened before a schedule downtime. Somewhere else in the documentation (or was it by Ethan somewhere else?) it states that for every alert sent, there will be a recovery if it goes OK again. And alas, Ethan seems to agree with me, so I can't be that wrong, eh? regards Sascha -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Antwort: Recovery not gettingsent duringdowntime?
[EMAIL PROTECTED] schrieb am 02.08.2006 18:42:56: Or you could temporarily disable notifications for the host during the reboot. The Nagios docs are pretty clear that [w]hen a host a service is in a period of scheduled downtime, notifications for that host or service will be suppressed. It's working as designed. It's working exactly as phrased in this part of the documentation. But we're not the church and the documentation ain't the bible, therefor it can be faulty ;) All I'm saying is the fact, that the documented and shown behaviour does not make all that much sense if viewed from certain points. I think we also have different definitions of scheduled. That's not a word I would use to describe rebooting a box to bring a failed service back up. Well yes, that might be. But if I understand you correct, then your definition of scheduled means that it has to be planned long before for a certain timeperiod. That's not what it's meant for. There's a reason why you cannot enter fixed downtimes for certain periods as in scheduling them ahead. You can only schedule downtimes instantly with a click, they're meant to be used like that - they don't leave you any other choice. I do not really see a difference between rebooting a machine because a bios update or because some service hangs. It's reboot and I planned it all by myself with my free will. Therefor I will schedule a downtime, as I know: it will go down when I press that reboot button. That's a scheduled downtime for me. But you are talking about sending alerts. Recovery alerts are still alerts, and alerts are suppressed during scheduled downtimes. Well, that's playing with words. Of course I'm talking about critical/warning/unknown/down when talking about alerts. I thought that was rather clear from my wording. Alerts are critical, warning, unknown, down. Recoveries are recoveries. Alerts and recoveries together are notifications. But recoveries are not alerts in my understanding. (unless I explicitly suppress all notifications for the service/host in question - but downtime shouldn't work this way) But it does. You're expecting it to do something other than what it's designed for, and to behave in a way other than how it's documented. Well, that's the question. Was it designed that way? I don't think so. Is it documented and implemented that way? Yes it is. Though the latter is true, that doesn't mean I cannot vouch for taking my point as it is a much better approach to the problem. The notifaction system must be made aware of why a certain notification get's send and decide whether it has to send it or not. Scheduled downtime should not blindly block everything. If it would, then where's the difference between scheduling downtime and just disabling notifications for a host or service? regards Sascha PS: Why hasn't HP released a new centrino driver for their notebooks yet? We are in dire need for an update because of the newly discovered security hole ;) -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Recovery not getting sent during downtime?
(repost from nagios-devel) Hi folks, I'm currently using Nagios 2.0b3 (never change a running system ;)) and ran into the following problem: Service went critical SMS and emails got dispatched found problem, decided to reboot the machine to fix it scheduled downtime for host rebooted host everything went ok again no SMS/email got dispatched to state the service recovered though! I'm unsure if this problem was already fixed, I didn't find any real evidence in google or the changelogs. Though fixes in the recovery logics and notifcation system itself were documented, they weren't too detailed though. Question: is this a bug or feature? If it is a bug, has it been fixed in a newer release which I can update to? It poses a problem to us as admins that are currently offsite don't get messages that the problem is ok already. So we get quite some unnecessary phonecalls to check for a problem that is already solved. Here's an excerpt how it looked like in the nagios log: [1153954542] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection refused [1153954600] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;2;Connection refused [1153954660] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;HARD;3;Connection refused [1153954660] SERVICE NOTIFICATION: RGingter;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused [1153954660] SERVICE NOTIFICATION: MArslan;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused [1153954660] SERVICE NOTIFICATION: IT_Service;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused [1153955260] SERVICE NOTIFICATION: RGingter_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused [1153955260] SERVICE NOTIFICATION: MArslan_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused ...rest of alerts snipped out... [1153980519] EXTERNAL COMMAND: SCHEDULE_HOST_DOWNTIME;NSEXT01;1153980509;1153981829;1;0;7200;technik;Neustart MAr [1153980519] HOST DOWNTIME ALERT: NSEXT01;STARTED; Host has entered a period of scheduled downtime [1153980595] HOST ALERT: NSEXT01;DOWN;SOFT;1;CRITICAL - 10.150.1.2: rta nan, lost 100% [1153980605] HOST ALERT: NSEXT01;DOWN;SOFT;2;CRITICAL - 10.150.1.2: rta nan, lost 100% [1153980615] HOST ALERT: NSEXT01;DOWN;HARD;3;CRITICAL - 10.150.1.2: rta nan, lost 100% [1153980615] SERVICE ALERT: NSEXT01;PING;CRITICAL;HARD;1;CRITICAL - 10.150.1.2: rta nan, lost 100% [1153980687] SERVICE ALERT: NSEXT01;CPU;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds [1153980687] SERVICE ALERT: NSEXT01;UPTIME;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds [1153980687] SERVICE ALERT: NSEXT01;DISK_C;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds [1153980707] HOST ALERT: NSEXT01;UP;HARD;1;OK - 10.150.1.2: rta 1.382ms, lost 0% [1153980707] SERVICE ALERT: NSEXT01;PING;OK;HARD;1;OK - 10.150.1.2: rta 3.307ms, lost 0% [1153980767] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection refused [1153980805] SERVICE ALERT: NSEXT01;MEMUSE;CRITICAL;SOFT;1;Connection refused [1153980805] SERVICE ALERT: NSEXT01;DISK_D;CRITICAL;SOFT;1;Connection refused [1153980805] SERVICE ALERT: NSEXT01;DISK_E;CRITICAL;SOFT;1;Connection refused [1153980828] SERVICE ALERT: NSEXT01;NOTES;OK;SOFT;2;TCP OK - 0.070 second response time on port 1352 [1153980976] SERVICE ALERT: NSEXT01;CPU;OK;HARD;1;CPU Load 37% (10 min average) [1153980976] SERVICE ALERT: NSEXT01;UPTIME;OK;HARD;1;System Uptime - 0 day(s) 0 hour(s) 5 minute(s) [1153980976] SERVICE ALERT: NSEXT01;DISK_C;OK;HARD;1;C:\ - total: 3.00 Gb - used: 2.05 Gb (68%) - free 0.95 Gb (32%) [1153981105] SERVICE ALERT: NSEXT01;MEMUSE;OK;SOFT;2;Memory usage: total:1951.26 Mb - used: 434.44 Mb (22%) - free: 1516.82 Mb (78%) [1153981105] SERVICE ALERT: NSEXT01;DISK_D;OK;SOFT;2;D:\ - total: 5.43 Gb - used: 2.46 Gb (45%) - free 2.97 Gb (55%) [1153981105] SERVICE ALERT: NSEXT01;DISK_E;OK;SOFT;2;E:\ - total: 67.83 Gb - used: 14.92 Gb (22%) - free 52.91 Gb (78%) [1153981832] HOST DOWNTIME ALERT: NSEXT01;STOPPED; Host has exited from a period of scheduled downtime Any insight in this would be appreciated. sincerely Sascha -- Sascha Runschke Netzwerk Management IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net
Antwort: [Nagios-users] How to reduce a very high latency number
[EMAIL PROTECTED] schrieb am 17.05.2006 20:09:16: I am still butting up against very high latency issues with my Nagios setup. I feel like I must be missing something obvious because it doesn't seem like I have so many services that the servers cannot keep up. nag2: 193/1743 Machine hardware: 1Us running Fedora Core 4 / P4 2.4GHz / 512MB RAM / 40GB ATA 8MB cache 7200rpm drives To me this is obviously a performance issue related to hardware. Your machines have way too few RAM. It is totally not possible to run 1800 checks on a 512MB machine in a timely manner. Think about this: Everytime Nagios starts a check, it forks a child, which forks the check. Nagios usually uses up 26MB total memory per process, the check another 5MB maybe. When running 1800 checks, we are speaking of spreading out 55 GIGAbytes of needed Ram on 512 MB real Ram. Imagine how often that works without having the machines doing a shitload of swapping and io-wait. I really cannot imagine how such a machine can NOT swap when running Nagios. Are you totally sure that you did not make a mistake when checking the machine? Here's a lineup of our dedicated Nagios server, which is a minimal install of RHES4 with only Nagios/Apache running on it (and the HP Insightmanager tools and TSM backup client, but that should not reall matter that much ;)) : top - 11:48:52 up 69 days, 19:10, 1 user, load average: 0.75, 0.70, 0.67 Tasks: 53 total, 2 running, 51 sleeping, 0 stopped, 0 zombie Cpu(s): 9.3% us, 4.3% sy, 0.0% ni, 62.5% id, 23.9% wa, 0.0% hi, 0.0% si Mem: 3116384k total, 2341696k used, 774688k free,55188k buffers Swap: 6291448k total, 144k used, 6291304k free, 2148772k cached This is a HP DL380, 3,6Ghz Xeon with 3GB of Ram and a Raid5. It is currently running only 120 hosts with around 500 checks, but those are in a high frequency schedule - ~400 checks per minute - as those are the company-critical services. Therefor it is under real pressure as you can see from the 2.3GB Mem usage and the 0.75 load with only 500 checks. But I think it is kinda comparable to your triple amount of checks. You should really, really upgrade the ram in the machines. In my opinion that would solve most of your problems, as I imagine you have a lot of io-wait on this machine (which you can check with an uptodate top by the way ;)) regards Sascha -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html --- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Antwort: [Nagios-users] check_dig bug ?
[EMAIL PROTECTED] stucky]# /usr/local/nagios/libexec/check_dig -w 1 -c 2 -H {nameserver} -l {fqdn} -a {ip} DNS WARNING - 0.011 seconds response time ({fqdn} 38400 IN A {ip})|time=0. 011227s;1.00;2.00;0.00 Have I totally gone nuts or did I not just tell check_dig to only warn me if the query takes more than one second ? As you can see the tool itself reports it took only 0.011 seconds so why the warning ? It's annoying cause I get those random fake alerts and another recovery messager soon after. Thing is even if I totally leave the -w and -c flags out it'll still do that as if it had a hardcoded value between 0.008 and 0.011 in there that can't be changed. This is a redhat issue, there's something they screwed up in the futex handling of their kernel. Do a manual up2date -uf to force the update of kernel packages, which are usually excluded by up2date. Reboot - be happy. This has been fixed with hotfix-kernel-2.6.9-22 (not publicly available) and later on with the last major update 3 (nahant). regards Sascha -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html --- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnkkid=110944bid=241720dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Antwort: [Nagios-users] Issue with Check_ssh
[EMAIL PROTECTED] schrieb am 20.03.2006 16:01:10: I am using check_ssh to check if the ssh is runing on some hosts. On a number of hosts the check is running ok, on others I am getting check_ssh: Could not parse arguments. I can log on using ssh that are giving me the parsing error. Can any one help me to resolve this nagios error? Well, maybe we could - but not if you do not provide us with the configs of the service in question. It seems obvious that you are passing wrong arguments to check_ssh - at least that's what it says. Else there isn't much to guess without major interferences with my mystic crystalball. regards sash -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html
Antwort: Re: Antwort: [Nagios-users] Issue with Check_ssh
Gordon Stewart [EMAIL PROTECTED] schrieb am 20.03.2006 16:23:22: The command line I am having the trouble with is define command{ command_name check_ssh command_line $USER1$/check_ssh -t 10 -p 99 $HOSTADDRESS$ } As I said it works with some hosts but not all hosts. Well, then you would do good posting the other relevant info too, like the service definition and the host definition. It seems like $HOSTADDRESS$ sometimes returns something wrong, maybe you mixed up hostaddress and hostalias in a few host definitions? regards sash -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html
Re: [Nagios-users] Installing Nagios RPM
[EMAIL PROTECTED] schrieb am 05.01.2006 10:56:11: Is it seen as recommended behaviour to use Antwort: instead of Re:? Especially when sending mail to a mailing list that runs in english? SCNR ;-) I didn't take any offense ;) But it's not something I can do anything about. Lotus Notes sets it and you cannot change it by any means. But I do manually edit the subject if it has too many Antwort: Re: Antwort: Re: prefixes. BTW, a footer is seperated with/preceded by -- (without the ) :- not by a line full of -. I know, but yet again, there is nothing I could do about it. Well I could, but I would risk getting a phonecall from the management if they ever find out... This footer is mandatory in our company. Even though I know it does not resemble the average netiquette, I think I am valuable enough to this mailing list that people can bare with it ;) regards sash PS: Yes, I could manually add a -- at the end of every post of mine. Sorry, but I hope everyone understands that I am SO not going to do it, for the sake of lazyness ;) -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Antwort: [Nagios-users] Premature end of script headers
[EMAIL PROTECTED] schrieb am 04.01.2006 11:18:58: However the web interface doesnt seem to allow the cgi's to execute. I can get to http://serverIP/nagios/ The index page and documentation loads fine but i get the following errors when trying to load ANY of the cgi's: Premature end of script headers: /usr/local/nagios/sbin/status.cgi Premature end of script headers: /usr/local/nagios/sbin/extinfo.cgi Premature end of script headers: /usr/local/nagios/sbin/tac.cgi Well, you obviously gave yourself the answer already. How about allowing apache to execute the cgi's? ;) sash -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Antwort: Re: [Nagios-users] Installing Nagios RPM
[EMAIL PROTECTED] schrieb am 04.01.2006 18:40:10: Doing rpm -q nagios shows nagios-1.2-2.1.el3.rf How do I remove that? Doing rpm -e nagios Shows: error reading information on service nagios: No such file or directory error: %preun(nagios-1.2-2.1.el3.rf) scriptlet failed, exit status 1 http://learn.to/quote Topposting with fullquotes are considered rude. It seems like you managed to screw your nagios installation. If I had to guess: you installed the nagios rpm (obviously without knowing you did) and then tried to do some manual installing - destroying vital information, needed by the rpm for a successful uninstall for example. Looks like you have some major problems here, it's kinda hard to remotely solve your problem without access to the machine. There are a myriad of places where you would need to check where you broke the installation. regards sash -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Antwort: [Nagios-users] possible bug: escalations don't work if state changes
[EMAIL PROTECTED] schrieb am 12.12.2005 16:22:34: I am using Nagios 2.0b6, but also experienced this issue in 2.0b4. Nagios escalations do not seem to work when the state changes after a maximum notifications level has been reached. For example, if a You are mislead. Your escalations are only defined up to notification 5, at notification 6 they end and Nagios reverts to the base definition of the service. define hostescalation{ host_nametest-server first_notification5 last_notification 5 notification_interval 0 contact_groupsoncall,backup } define serviceescalation{ host_name test-server service_description /MYSQL first_notification5 last_notification 5 notification_interval 0 contact_groupsoncall,backup } Please let me know if I need to include more information. Thanks in advance. Changing last_notification to 0 in those cases should produce your desired effect if I did understand you correctly. regards sash -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Antwort: [Nagios-users] nagios not running host check_command
[EMAIL PROTECTED] schrieb am 28.11.2005 14:06:26: I have about 20 servers, for which I have no specific services to monitor, but which I am interested in their host status, ie the check_command in the host entry in the hosts.cfg file is set to check-host-alive. However, it appears that the host check_command is never executed, and availability for the host is always Undetermined - Insufficient Data 1d 0h 0m 0s 100.000% Also the status in Host Status detail is always pending, though there are no other checks configured for this host. If I explicitly add a check-host-alive for the host in a service configuration, then the host appears as up in the Host Status Details list, but obviously I would expect it to appears as UP due to the check_command Any ideas on whether this is the expected behaviour for the check_command directive? It is the expected behaviour. Nagios is a network monitoring tool, not a host-only monitoring tool. It actually expects that a host must have a kind of service running to be useful - which is more or less true in 99.99% of the cases ;) Therefor nagios works like this: 1. do the defined service-checks on all defined hosts 2. IF and only IF one of those checks fail, then issue a check_host_alive to see if it's a problem with the service or if the host is down It is safe to assume that a host is alive if its services are responding. Checking the host for being alive again is redundant unless one of its services fail. So you need to set up some kind of services for each host to have them appear as UP. Though there _is_ a configuration option to enforce check_host_alive checks, but alas I cannot remember it. Your best friend is the documentation in this matter. (or just look through the configfile and check for this particular option) regards sash -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net - Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null