[Nagios-users] Antwort: How to not send out first service notifications?

2006-10-19 Thread srunschke
[EMAIL PROTECTED] schrieb am 19.10.2006 11:04:19:

 I am monitoring some hosts on the Internet for informational reasons.
 Since these hosts quite frequently have failed services, I'd like my
 Nagios to refrain from notifying me if a service is down at the first
 notification. Subsequent notifications, however, should be sent out.
 
 Is there a way to do this any easier than having no notifications set
 in the service definition and have a service escalation having the
 list of contacts that used to be in the service definition?

If you always only want the 2nd notification, then your approach sounds
wrong. You shouldn't suppress the first notification always, but instead
maybe raise the number of consecutive failed checks until you throw
a hard state, so you do not get too many false warnings.

regards
Sascha

--
Sascha Runschke
Netzwerk Management
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Antwort: Service escalation for service groups?

2006-10-19 Thread srunschke
[EMAIL PROTECTED] schrieb am 19.10.2006 11:08:58:

 Hi,
 
 in the Nagios 2.x docs, a serviceescalation item can be configured for
 a host name and a service description. Is there any possibility to
 define escalation items that automatically apply for all members of a
 service group?

Example:

define serviceescalation {
servicegroup_name   WUT-SERVICEGROUP
first_notification  1
last_notification   0
contact_groups HOST-CONTACTGROUP-SMS,HOST-CONTACTGROUP-MAIL
notification_interval   10
escalation_period   24x7
escalation_options  w,c,r
}

regards
Sascha

--
Sascha Runschke
Netzwerk Management
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Antwort: UTF8 Japanese characters in macros

2006-10-18 Thread srunschke
[EMAIL PROTECTED] schrieb am 18.10.2006 04:26:19:

 /usr/bin/printf %b * Nagios  *\n\nNotification Type:
 $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress:
 $HOSTADDRESS$\n$HOSTNOTES$\n\nInfo: $HOSTOUTPUT$\n\nDate/Time:
 $LONGDATETIME$\n | /usr/bin/mail -s Host $HOSTSTATE$ alert for
 $HOSTNAME$! $CONTACTEMAIL$
[SNIP]
 Both printf and mail work fine with UTF8 japanese, and tests using only
 the $HOSTNOTES$ macro yeilds similar results (nothing output).
 The CGI's work fine, and display the japanese, so it seems the macro
 processing is not able to handle the characters.
 
 Has anyone had similar problems, and is there a solution?

I can't try the japanese UTF, but are you sure that /usr/bin/printf
supports UTF8? Typing printf at the shell is not the same as using
/usr/bin/printf. The bash built-in printf function is very different
from the external executable and has caused lots of headaches to me
already. It might be the source of your problem too.

Sascha

PS: This problem is one of the reasons I strongly discourage the current
standard of how default notifications are generated...

--
Sascha Runschke
Netzwerk Management
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Antwort: Re: security suid/sudo plugins

2006-09-04 Thread srunschke
[EMAIL PROTECTED] schrieb am 02.09.2006 18:06:47:

 To make things clearer, the setup I'm proposing is this:
 
 1. # /usr/local/sbin/visudo 
 ...
 nagios  ALL=(ALL) NOPASSWD: /usr/local/nagios/libexec/check_logfiles -f 
 /usr/local/nagios/etc/check_logfiles.cfg
 
 2. # vi /usr/local/nagios/etc/nrpe.cfg
 ...
 command[check_logfiles]=/usr/local/bin/sudo 
 /usr/local/nagios/libexec/check_logfiles -f 
 /usr/local/nagios/etc/check_logfiles.cfg 
 
 3. # grep nagios /etc/passwd
 nagios:x:1123:100:Nagios Remote User:/usr/local/nagios:/usr/bin/bash
 
 Note to Hari: my understanding is that sudo won't work for account that 
 doesn't have a valid shell. Certainly all my testing led me to that 
conclusion. 
 
 4. # passwd -l nagios
 
 It's not clear to me exactly what the security risk is. The idea is that 

 someone may gain access to an unprivileged account on the system and 
then 
 use this access and this Nagios plugin to cause mailicious damage? Or to 

 break the root account? In which case, it would all come down to how 
 secure the code of the plugin is. Is this correct? 

Looks ok so far, you just have to make sure of one BIG issue.
/usr/local/nagios/libexec/check_logfiles MUST NOT be owned by
the nagios user/group and the nagios user/group MUST NOT have
write permissions.
Imagine someone doing:
copy /usr/bin/bash /usr/local/nagios/libexec/check_logfiles

In regard to security of the plugin code itself, you're more or less
on the safe side here. Since you hardcoded the parameters of the
root call, you cannot suffer from buffer overflows caused my malicious
parameters and exploiting the plugin via the logfiles itself is both
most unlikely and secondly would mean someone already compromised the
system - else he couldn't forge syslog entries ;)

regards
Sascha

--
Sascha Runschke
Netzwerk Management
IT-Services

ABIT AG i. Gr.
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Antwort: Re: Recovery not getting sent during downtime?

2006-08-02 Thread srunschke
[EMAIL PROTECTED] schrieb am 31.07.2006 19:42:08:

 Pardon me. But what is the problem? You have a problem. It triggers an
 alert. You act and 'fix' it by scheduling downtime. Then you bring the
 system alive within the allocated maintenance window.

I'm sorry, but I got a hard time to pardon such rude answers.

 It seems there is little point in sending a status change if you declare 
a
 maintenance window. Then it is obviously a planned action and there is 
no
 need to send out any alert.

There's a big point in sending RECOVERY even in scheduled downtimes.
I already mentioned it, but I don't mind explaining it again to you.
If a service failed before a downtime, it needs to send out recovery
notifications even when in downtime. Take my example: service goes
critical, every admin gets notified, host goes into downtime,
host reboots, service goes ok after reboot, no notifications get sent
because the host is still in downtime.
Every admin that is currently offsite can't know about the reboot
(cause it happened in downtime and noone noticed despite the one who
rebooted) and will never get notified about the recovery. That poses a
big problem for bigger companies who have numerous admins getting
notifications from nagios. If such a case occurs I usually get a call from
my CTO why problem XY wasn't fixed yet and I have to tell him:
I fixed it with a reboot, but nagios didn't send out the recovery because
of the downtime.

It wastes time and ressources.
Time and ressources mean money.
Wasted money is bad.

sincerely
Sascha

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Antwort: Re: Antwort: Re: Recovery not getting sent duringdowntime?

2006-08-02 Thread srunschke
[EMAIL PROTECTED] schrieb am 02.08.2006 18:04:06:

 The problem is that it sounds like you're using scheduled downtimes
 incorrectly. It's not meant to be used for *un*scheduled downtimes; thus
 the name.  It's meant to supress alerts from a machine during the
 specified window, and that's exactly what it's doing in your case.

It is a scheduled downtime. I put the host into downtime, because I
was planning to reboot it. I do not want notifications to be sent
out for the reboot of course, so I am forced to set a downtime.

 I can tell you that I'd really annoyed if it didn't work as advertised,
 and *did* send alerts in the middle of the night when I was working on a
 box and someone else was carrying the pager (well, I might not be the
 guy to get annoyed, but I'd hear about it in the morning).

I'm not talking about sending alerts. I'm talking about sending
recoveries for alerts that happened _before_ the downtime, not
for suppressed alerts _during_ the downtime. For those a recovery
should never be send of course - as no alert has been sent.

 It sounds like you should probably be acknowledging critical services,
 rather than marking them as being in scheduled downtime when they're
 not.  That way the alerts stop until the service comes back up, and
 you'll be notified when it changes state.

If I acknowledge the problem, everyone get's a notification too.
Where's the benefit?
And acknowledging the problem doesn't make any difference.

Service goes critical
SMS gets dispatched
Problem gets acknowledged
SMS gets dispatched
Host gets scheduled for downtime
Reboot
Host/Service OK

Still no notification for the rest of the admins that the service
is fixed. For them it looks like it's acknowledged and I'm working on
it - but no sign ever that I fixed it.

Problem persists and still no progress for me as the outcome stays the
same: Noone gets notified that the problem was fixed and people will call
if I am still working to fix the problem.

I still say: for every WARNING/CRITICAL/UNKNOWN that has be sent there
must follow a RECOVERY in case the Service/Host recovers. That's expected
behaviour and everything else is rather diffuse behaviour in my opinion.
(unless I explicitly suppress all notifications for the service/host in
question - but downtime shouldn't work this way)

regards
Sascha

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Antwort: Recovery not getting sent duringdowntime?

2006-08-02 Thread srunschke
[EMAIL PROTECTED] schrieb am 02.08.2006 18:36:18:

 I think you might want to rethink your process so that it matches the
 paradigm of the tool you are using. It already can do what you want if
 you just work with the way it was designed, rather than forcing a
 process that breaks the paradigm.

I think you might want to re-read my mail, it seems you did
not fully understand what I meant ;)

 Service goes critical
 This is not a scheduled event. It is unscheduled.

I never said anything else.
I never said I scheduled a downtime for the SERVICE.
I said I scheduled a downtime for the HOST, because I was
forced to reboot the HOST to fix the SERVICE problem.
I didn't have the possibility of acknowledging the service,
fix it and be happy. Sometimes certain services of some
retarded OS tend to kastrate themselves and only a reboot
can fix it. If I do not schedule a HOST downtime, then
SMSs get dispatched for the HOST being down and going up
and then for the service recovering. Not exactly the
behaviour I'd like to see.

 Since the unscheduled event has already been acknowledged and everyone
 who might want to jump in to help already knows it is being handled,
 there is no need to schedule a downtime for an unscheduled event. Just
 acknowledge it. 
 
 Reboot
 Host/Service OK
 Recovery Note goes out to everyone, including CTO. Problem Solved.
 Everyone knows what is happening. Your performance evaluation gets a
 boost for being the one to solve the problem.

Uhm and what about the 2 SMS going out to everyone stating the
host going down and up again? That is neither expected (by the rest
of the admins for example), nor wanted behaviour.

I DO know what the documentation says regarding scheduled downtimes.
I DO know it says it suppresses all alerts.
Yet I still say it should not suppress recoveries for critical/warning/
unknown which happened before a schedule downtime. Somewhere else in the
documentation (or was it by Ethan somewhere else?) it states that for
every alert sent, there will be a recovery if it goes OK again.
And alas, Ethan seems to agree with me, so I can't be that wrong, eh?

regards
Sascha

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Antwort: Recovery not gettingsent duringdowntime?

2006-08-02 Thread srunschke
[EMAIL PROTECTED] schrieb am 02.08.2006 18:42:56:

 Or you could temporarily disable notifications for the host during the
 reboot.  The Nagios docs are pretty clear that [w]hen a host a service
 is in a period of scheduled downtime, notifications for that host or
 service will be suppressed.  It's working as designed.

It's working exactly as phrased in this part of the documentation.
But we're not the church and the documentation ain't the bible,
therefor it can be faulty ;)

All I'm saying is the fact, that the documented and shown behaviour
does not make all that much sense if viewed from certain points.

 I think we also have different definitions of scheduled.  That's not a
 word I would use to describe rebooting a box to bring a failed service
 back up.

Well yes, that might be.
But if I understand you correct, then your definition of scheduled
means that it has to be planned long before for a certain timeperiod.
That's not what it's meant for. There's a reason why you cannot enter
fixed downtimes for certain periods as in scheduling them ahead.
You can only schedule downtimes instantly with a click, they're
meant to be used like that - they don't leave you any other choice.

I do not really see a difference between rebooting a machine because
a bios update or because some service hangs. It's reboot and I planned
it all by myself with my free will. Therefor I will schedule a downtime,
as I know: it will go down when I press that reboot button.

That's a scheduled downtime for me.

 But you are talking about sending alerts.  Recovery alerts are still
 alerts, and alerts are suppressed during scheduled downtimes.

Well, that's playing with words. Of course I'm talking about
critical/warning/unknown/down when talking about alerts. I thought
that was rather clear from my wording. Alerts are critical, warning,
unknown, down. Recoveries are recoveries. Alerts and recoveries
together are notifications. But recoveries are not alerts in my
understanding.

  (unless I explicitly suppress all notifications for the 
  service/host in question - but downtime shouldn't work this way)
 
 But it does.  You're expecting it to do something other than what it's
 designed for, and to behave in a way other than how it's documented.

Well, that's the question. Was it designed that way? I don't think so.
Is it documented and implemented that way? Yes it is.
Though the latter is true, that doesn't mean I cannot vouch for taking
my point as it is a much better approach to the problem. The notifaction
system must be made aware of why a certain notification get's send and
decide whether it has to send it or not. Scheduled downtime should not
blindly block everything. If it would, then where's the difference
between scheduling downtime and just disabling notifications for a host
or service?

regards
Sascha

PS: Why hasn't HP released a new centrino driver for their notebooks
yet? We are in dire need for an update because of the newly discovered
security hole ;)

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Recovery not getting sent during downtime?

2006-07-31 Thread srunschke
(repost from nagios-devel)

Hi folks,

I'm currently using Nagios 2.0b3 (never change a running system ;)) and 
ran into the following problem:

Service went critical
SMS and emails got dispatched
found problem, decided to reboot the machine to fix it
scheduled downtime for host
rebooted host
everything went ok again
no SMS/email got dispatched to state the service recovered though!

I'm unsure if this problem was already fixed, I didn't find any real
evidence in google or the changelogs. Though fixes in the
recovery logics and notifcation system itself were documented,
they weren't too detailed though.

Question: is this a bug or feature? If it is a bug, has it been fixed in
a newer release which I can update to?

It poses a problem to us as admins that are currently offsite don't get
messages that the problem is ok already. So we get quite some unnecessary
phonecalls to check for a problem that is already solved.

Here's an excerpt how it looked like in the nagios log:

[1153954542] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection 
refused
[1153954600] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;2;Connection 
refused
[1153954660] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;HARD;3;Connection 
refused
[1153954660] SERVICE NOTIFICATION: 
RGingter;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153954660] SERVICE NOTIFICATION: 
MArslan;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153954660] SERVICE NOTIFICATION: 
IT_Service;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153955260] SERVICE NOTIFICATION: 
RGingter_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused
[1153955260] SERVICE NOTIFICATION: 
MArslan_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused
...rest of alerts snipped out...
[1153980519] EXTERNAL COMMAND: 
SCHEDULE_HOST_DOWNTIME;NSEXT01;1153980509;1153981829;1;0;7200;technik;Neustart 

MAr
[1153980519] HOST DOWNTIME ALERT: NSEXT01;STARTED; Host has entered a 
period of scheduled downtime
[1153980595] HOST ALERT: NSEXT01;DOWN;SOFT;1;CRITICAL - 10.150.1.2: rta 
nan, lost 100%
[1153980605] HOST ALERT: NSEXT01;DOWN;SOFT;2;CRITICAL - 10.150.1.2: rta 
nan, lost 100%
[1153980615] HOST ALERT: NSEXT01;DOWN;HARD;3;CRITICAL - 10.150.1.2: rta 
nan, lost 100%
[1153980615] SERVICE ALERT: NSEXT01;PING;CRITICAL;HARD;1;CRITICAL - 
10.150.1.2: rta nan, lost 100%
[1153980687] SERVICE ALERT: NSEXT01;CPU;CRITICAL;HARD;1;CRITICAL - Socket 
timeout after 10 seconds
[1153980687] SERVICE ALERT: NSEXT01;UPTIME;CRITICAL;HARD;1;CRITICAL - 
Socket timeout after 10 seconds
[1153980687] SERVICE ALERT: NSEXT01;DISK_C;CRITICAL;HARD;1;CRITICAL - 
Socket timeout after 10 seconds
[1153980707] HOST ALERT: NSEXT01;UP;HARD;1;OK - 10.150.1.2: rta 1.382ms, 
lost 0%
[1153980707] SERVICE ALERT: NSEXT01;PING;OK;HARD;1;OK - 10.150.1.2: rta 
3.307ms, lost 0%
[1153980767] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection 
refused
[1153980805] SERVICE ALERT: NSEXT01;MEMUSE;CRITICAL;SOFT;1;Connection 
refused
[1153980805] SERVICE ALERT: NSEXT01;DISK_D;CRITICAL;SOFT;1;Connection 
refused
[1153980805] SERVICE ALERT: NSEXT01;DISK_E;CRITICAL;SOFT;1;Connection 
refused
[1153980828] SERVICE ALERT: NSEXT01;NOTES;OK;SOFT;2;TCP OK - 0.070 second 
response time on port 1352
[1153980976] SERVICE ALERT: NSEXT01;CPU;OK;HARD;1;CPU Load 37% (10 min 
average)
[1153980976] SERVICE ALERT: NSEXT01;UPTIME;OK;HARD;1;System Uptime - 0 
day(s) 0 hour(s) 5 minute(s)
[1153980976] SERVICE ALERT: NSEXT01;DISK_C;OK;HARD;1;C:\ - total: 3.00 Gb 
- used: 2.05 Gb (68%) - free 0.95 Gb (32%)
[1153981105] SERVICE ALERT: NSEXT01;MEMUSE;OK;SOFT;2;Memory usage: 
total:1951.26 Mb - used: 434.44 Mb (22%) - free: 1516.82 Mb (78%)
[1153981105] SERVICE ALERT: NSEXT01;DISK_D;OK;SOFT;2;D:\ - total: 5.43 Gb 
- used: 2.46 Gb (45%) - free 2.97 Gb (55%)
[1153981105] SERVICE ALERT: NSEXT01;DISK_E;OK;SOFT;2;E:\ - total: 67.83 Gb 

- used: 14.92 Gb (22%) - free 52.91 Gb (78%)
[1153981832] HOST DOWNTIME ALERT: NSEXT01;STOPPED; Host has exited from a 
period of scheduled downtime

Any insight in this would be appreciated.

sincerely
Sascha

--
Sascha Runschke
Netzwerk Management
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net

Antwort: [Nagios-users] How to reduce a very high latency number

2006-05-22 Thread srunschke
[EMAIL PROTECTED] schrieb am 17.05.2006 20:09:16:

 I am still butting up against very high latency issues with my Nagios
 setup.  I feel like I must be missing something obvious because it
 doesn't seem like I have so many services that the servers cannot keep
 up.

 nag2: 193/1743

 Machine hardware:
 1Us running Fedora Core 4 / P4 2.4GHz / 512MB RAM / 40GB ATA 8MB cache
 7200rpm drives

To me this is obviously a performance issue related to hardware.
Your machines have way too few RAM. It is totally not possible to
run 1800 checks on a 512MB machine in a timely manner.

Think about this:
Everytime Nagios starts a check, it forks a child, which forks the
check. Nagios usually uses up 26MB total memory per process, the check
another 5MB maybe. When running 1800 checks, we are speaking of spreading
out 55 GIGAbytes of needed Ram on 512 MB real Ram. Imagine how often
that works without having the machines doing a shitload of swapping and
io-wait. I really cannot imagine how such a machine can NOT swap when
running Nagios. Are you totally sure that you did not make a mistake
when checking the machine?

Here's a lineup of our dedicated Nagios server, which is a minimal install
of RHES4 with only Nagios/Apache running on it (and the HP Insightmanager
tools and TSM backup client, but that should not reall matter that much 
;)) :

top - 11:48:52 up 69 days, 19:10,  1 user,  load average: 0.75, 0.70, 0.67
Tasks:  53 total,   2 running,  51 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.3% us,  4.3% sy,  0.0% ni, 62.5% id, 23.9% wa,  0.0% hi,  0.0% 
si
Mem:   3116384k total,  2341696k used,   774688k free,55188k buffers
Swap:  6291448k total,  144k used,  6291304k free,  2148772k cached

This is a HP DL380, 3,6Ghz Xeon with 3GB of Ram and a Raid5. It is 
currently
running only 120 hosts with around 500 checks, but those are in a high
frequency schedule - ~400 checks per minute - as those are the 
company-critical
services. Therefor it is under real pressure as you can see from the 2.3GB
Mem usage and the 0.75 load with only 500 checks. But I think it is kinda
comparable to your triple amount of checks.

You should really, really upgrade the ram in the machines. In my opinion 
that
would solve most of your problems, as I imagine you have a lot of io-wait 
on
this machine (which you can check with an uptodate top by the way ;))

regards
Sascha

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


---
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Antwort: [Nagios-users] check_dig bug ?

2006-04-10 Thread srunschke
 [EMAIL PROTECTED] stucky]# /usr/local/nagios/libexec/check_dig -w 1 -c 2 -H 
 {nameserver} -l {fqdn} -a {ip}
 DNS WARNING - 0.011 seconds response time ({fqdn} 38400 IN A 
{ip})|time=0.
 011227s;1.00;2.00;0.00
 
 Have I totally gone nuts or did I not just tell check_dig to only warn 
me 
 if the query takes more than one second ? As you can see the tool itself
 reports it took only 0.011 seconds so why the warning ?
 It's annoying cause I get those random fake alerts and another recovery 
 messager soon after.
 Thing is even if I totally leave the -w and -c flags out it'll still do 
 that as if it had a hardcoded value between 0.008 and 0.011 in there 
that 
 can't be changed.

This is a redhat issue, there's something they screwed up in the futex
handling of their kernel. Do a manual up2date -uf to force the update
of kernel packages, which are usually excluded by up2date.
Reboot - be happy. This has been fixed with hotfix-kernel-2.6.9-22 (not
publicly available) and later on with the last major update 3 (nahant).

regards
Sascha

-- 
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid=110944bid=241720dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Antwort: [Nagios-users] Issue with Check_ssh

2006-03-20 Thread srunschke

[EMAIL PROTECTED] schrieb am
20.03.2006 16:01:10:

 I am using check_ssh to check if the ssh is runing on some hosts.

 On a number of hosts the check is running ok, on others I am

 getting check_ssh: Could not parse arguments. I can log on using

 ssh that are giving me the parsing error. 
 Can any one help me to resolve this nagios error?

Well, maybe we could - but not if you do not provide
us with
the configs of the service in question. It seems obvious
that
you are passing wrong arguments to check_ssh - at
least that's
what it says. Else there isn't much to guess without
major
interferences with my mystic crystalball.

regards
sash

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
 Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


Antwort: Re: Antwort: [Nagios-users] Issue with Check_ssh

2006-03-20 Thread srunschke

Gordon Stewart [EMAIL PROTECTED] schrieb am
20.03.2006 16:23:22:

 The command line I am having the trouble with is 
 
 define command{
 command_name  check_ssh
 command_line  $USER1$/check_ssh
-t 10 -p 99 $HOSTADDRESS$
 }
 
 As I said it works with some hosts but not all hosts.

Well, then you would do good posting the other relevant
info too,
like the service definition and the host definition.
It seems like
$HOSTADDRESS$ sometimes returns something wrong, maybe
you mixed up
hostaddress and hostalias in a few host definitions?

regards
sash

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
 Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


Re: [Nagios-users] Installing Nagios RPM

2006-01-09 Thread srunschke
[EMAIL PROTECTED] schrieb am 05.01.2006 10:56:11:

 Is it seen as recommended behaviour to use Antwort: instead of Re:?
 Especially when sending mail to a mailing list that runs in english?
 
 SCNR ;-)

I didn't take any offense ;)
But it's not something I can do anything about.
Lotus Notes sets it and you cannot change it by any means.
But I do manually edit the subject if it has too many
Antwort: Re: Antwort: Re: prefixes.

 BTW, a footer is seperated with/preceded by --  (without the ) :- 
not
 by a line full of -.

I know, but yet again, there is nothing I could do about it. Well
I could, but I would risk getting a phonecall from the management
if they ever find out...

This footer is mandatory in our company. Even though I know it
does not resemble the average netiquette, I think I am valuable
enough to this mailing list that people can bare with it ;)

regards
sash

PS: Yes, I could manually add a --  at the end of every post of mine.
Sorry, but I hope everyone understands that I am SO not going to do
it, for the sake of lazyness ;)

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Antwort: [Nagios-users] Premature end of script headers

2006-01-04 Thread srunschke
[EMAIL PROTECTED] schrieb am 04.01.2006 11:18:58:

 However the web interface doesnt seem to allow the cgi's to execute. I 
can 
 get to http://serverIP/nagios/
 
 The index page and documentation loads fine but i get the following 
errors 
 when trying to load ANY of the cgi's:
 
 Premature end of script headers: /usr/local/nagios/sbin/status.cgi
 Premature end of script headers: /usr/local/nagios/sbin/extinfo.cgi
 Premature end of script headers: /usr/local/nagios/sbin/tac.cgi

Well, you obviously gave yourself the answer already.
How about allowing apache to execute the cgi's? ;)

sash

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Antwort: Re: [Nagios-users] Installing Nagios RPM

2006-01-04 Thread srunschke
[EMAIL PROTECTED] schrieb am 04.01.2006 18:40:10:

 Doing rpm -q nagios shows nagios-1.2-2.1.el3.rf
 
 How do I remove that?
 
 Doing rpm -e nagios
 
 Shows:
 error reading information on service nagios: No such file or directory
 error: %preun(nagios-1.2-2.1.el3.rf) scriptlet failed, exit status 1

http://learn.to/quote

Topposting with fullquotes are considered rude.

It seems like you managed to screw your nagios installation.
If I had to guess: you installed the nagios rpm (obviously
without knowing you did) and then tried to do some manual
installing - destroying vital information, needed by the rpm
for a successful uninstall for example.
Looks like you have some major problems here, it's kinda
hard to remotely solve your problem without access to the
machine. There are a myriad of places where you would need
to check where you broke the installation.

regards
sash

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Antwort: [Nagios-users] possible bug: escalations don't work if state changes

2005-12-12 Thread srunschke
[EMAIL PROTECTED] schrieb am 12.12.2005 16:22:34:

 I am using Nagios 2.0b6, but also experienced this issue in 2.0b4.
 Nagios escalations do not seem to work when the state changes after a
 maximum notifications level has been reached. For example, if a

You are mislead. Your escalations are only defined up to
notification 5, at notification 6 they end and Nagios reverts
to the base definition of the service.

 define hostescalation{
 host_nametest-server
 first_notification5
 last_notification 5
 notification_interval 0
 contact_groupsoncall,backup
 }
 
 define serviceescalation{
 host_name test-server
 service_description   /MYSQL
 first_notification5
 last_notification 5
 notification_interval 0
 contact_groupsoncall,backup
 }
 
 Please let me know if I need to include more information. Thanks in 
advance.

Changing last_notification to 0 in those cases should
produce your desired effect if I did understand you
correctly.

regards
sash


--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Antwort: [Nagios-users] nagios not running host check_command

2005-11-28 Thread srunschke
[EMAIL PROTECTED] schrieb am 28.11.2005 14:06:26:

 I have about 20 servers, for which I have no specific services to 
monitor, but
 which I am interested in their host status, ie the check_command in the 
host
 entry in the hosts.cfg file is set to check-host-alive.
 
 However, it appears that the host check_command is never executed, and
 availability for the host is always Undetermined - Insufficient Data 1d 
0h 0m
 0s 100.000%
 
 Also the status in Host Status detail is always pending, though there 
are no
 other checks configured for this host.
 
 If I explicitly add a check-host-alive for the host in a service 
 configuration,
 then the host appears as up in the Host Status Details list, but 
obviously I
 would expect it to appears as UP due to the check_command
 
 Any ideas on whether this is the expected behaviour for the 
check_command
 directive?

It is the expected behaviour.
Nagios is a network monitoring tool, not a host-only monitoring tool.
It actually expects that a host must have a kind of service running
to be useful - which is more or less true in 99.99% of the cases ;)
Therefor nagios works like this:

1. do the defined service-checks on all defined hosts
2. IF and only IF one of those checks fail, then issue a check_host_alive
to see if it's a problem with the service or if the host is down

It is safe to assume that a host is alive if its services are responding.
Checking the host for being alive again is redundant unless one of its
services fail.

So you need to set up some kind of services for each host to have them
appear as UP. Though there _is_ a configuration option to enforce
check_host_alive checks, but alas I cannot remember it. Your best friend
is the documentation in this matter. (or just look through the configfile
and check for this particular option)

regards
sash

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
-
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null