Re: [Nagios-users] How often does Nagios need restarting? (Quis custodiet ipsos custodes?)

2009-06-29 Thread Marc Powell

On Jun 29, 2009, at 4:20 PM, Kustner, Tom wrote:

> 2. Thanks for pointing out that host checks are not always performed
> unless a service has been detected has failing.  I value the service
> checking, but I assumed it was also pinging the host on a regular  
> basis
> and that is apparently not the case.  I come from the background of
> using products such as Insight Manager and OpenManage which are
> vendor-specific solutions that have their limitations but which
> automatically perform pinging on a regular basis.  I'll look at the
> documentation for information on getting that set for us.  It explains
> my frustration as to why a server can reboot and Nagios not detect it.


Word of warning - you *do not* want to enable regularly scheduled host  
checks under nagios-2.x. The current logic of only checking a host  
when a service is not OK is more than sufficient under normal  
circumstances. Enabling regularly scheduled checks under 2.x will only  
hurt your performance. While service checks can be done in parallel,  
host checks are done serially in that version. While a host is being  
checked, nagios stops *all other activity* until the host check  
completes; other checks, logging, notifications, everything.

To illustrate, if you have 200 hosts, sending 5 pings (~5 seconds to  
complete), it will take 200(hosts) x 5(seconds) = 1000 seconds just to  
check your host status. That's over 16 minutes that nagios is only  
checking those hosts and none of the services on those hosts, or  
sending notifications, or anything else.

Nagios-3.x implements parallel host checks, just like service checks,  
but even then regularly scheduled host checks aren't really needed or  
encouraged and are just a waste of resources that could be used for  
service checks, IMHO.

Even then, unless you're checking _very_ frequently, a modern server  
can easily reboot in the time between checks. I'd recommend using  
check_snmp as a service check to look at the snmp reported uptime and  
alert if it's less than a reasonable interval of your normal check  
interval (say 5-10 minutes typically).

--
Marc


--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] How often does Nagios need restarting? (Quis custodiet ipsos custodes?)

2009-06-29 Thread Kustner, Tom
Jamie, 

Sorry for the very tardy response - it's been busy.  Two responses to
all your good comments:

1.  The problems with Nagios "slacking on the job" were confirmed by the
Nagios administrator.  Permanently, he is looking to move the setup to a
box with more memory (it is currently a Red Hat box).  Temporarily, he
talked about using cron to restart it every night.  Since it's been more
responsive lately, I have a feeling he has done just that.  

I did ask if he thought whether upgrading the NSClient would help, since
we are currently using NSClient++ 0.3.1 and NSClient++ 0.3.6 was just
released, but while encouraging me to try out 0.3.6, he feels memory
constraints on the current box is the cause of the problem and nothing
else.

I didn't post any configs files because I am not the Nagios
administrator and don't have access to the configs or the box itself
except for what I am allowed to see for our hosts via http.  Thanks for
letting me know 2.9 is relatively stable and that it, in all likelihood,
is not the cause of the problem.

2. Thanks for pointing out that host checks are not always performed
unless a service has been detected has failing.  I value the service
checking, but I assumed it was also pinging the host on a regular basis
and that is apparently not the case.  I come from the background of
using products such as Insight Manager and OpenManage which are
vendor-specific solutions that have their limitations but which
automatically perform pinging on a regular basis.  I'll look at the
documentation for information on getting that set for us.  It explains
my frustration as to why a server can reboot and Nagios not detect it.

Thanks again for taking time.

Tom Kustner MCSE, CNE 
Inside: 68728 
Outside: 414-906-8728 
Mobile:  414-559-0889 

-Original Message-
From: James Pratt [mailto:jpr...@norwich.edu] 
Sent: Friday, June 19, 2009 8:23 PM
To: Kustner, Tom
Cc: nagios-users@lists.sourceforge.net
Subject: RE: [Nagios-users] How often does Nagios need restarting?
(Quiscustodiet ipsos custodes?)

Hi Tom, I've tried to answer your questions to the best of my own
personal knowledge -I have replaced any  of your original "*" symbols
with my own on all my comments/thoughts below, since my MS outlook
client apparently just sucks, so this appears more readable.

Regards,
jamie

-Original Message-
From: Kustner, Tom [mailto:tom.kust...@retirementpartner.com] 
Sent: Friday, June 19, 2009 5:35 PM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] How often does Nagios need restarting?
(Quiscustodiet ipsos custodes?)

I am a Nagios user, not the administrator.  We are running Nagios 2.9 on
RHEL 4 or 5.  Overall, 200+ hosts with 3000 services being monitored. I
have access for monitoring a smaller number of hosts.

* ok, understood...

In another posting, I alluded to an issue where a host had gone down but
no alert was sent out.   The issue surfaced again today and as was done
the other time, Nagios was restarted to "fix" the problem.   I am
naturally concerned about the unreliability.  

* did you get any on-list or off-list replies at all? You have not
mentioned if you had it resolved or not, but it sound like the answer is
no to possibly both(?)

Any thoughts on this problem?Specifically:

What are best practices for making sure Nagios does not fall down on
the job?   Is there something not set right?

* Understanding your setup and the way nagios works is how you ensure it
stands up... a mis-config sounds likely, but who knows...

Are other Nagios administrators restarting Nagios on a weekly or
nightly basis to keep it on the job?

* Heck no! That's why we run it on Linux or Solaris! :)

Is this an issue specific to Nagios 2.9?  Was 2.9 a spotty version?

*Not to my knowledge - all stable releases have worked very reliably
here, especially 2.9 now that I look back...

For a given host, why would "active checks" be enabled, yet "N/A"
appears in the "Next Active Check" field?

* RTM - host checks are not always performed unless service checks fail,
and since I've been a manual-slacker myself, that may not even be the
true correct answer (Marc? :)

Thanks for any help.

-Tom Kustner-

* Not to sound negative/condescending or anything like that, but your
install will truly only work as well as you have maintained
it/understand it. You should really look at your current config files
and read the manual on 2.9, or upgrade to 3.x and again rtm...  Also,
you have not sent anything specific related to your problematic
config(s) for anyone on this list to even guess either way whether or
not something is mis-configured. If you are concerned about posting your
configs/setup, change stuff properly to hide what you need to on-list.
(I apologize if I have missed your earlier posting. Many here try our
best to help people here when possible, but sometimes we are all busy at
the same time, who knows!?).

Cheers,
Jamie


The information contained in this message and any accompanyi

Re: [Nagios-users] How often does Nagios need restarting? (Quis custodiet ipsos custodes?)

2009-06-22 Thread cms.mahape
Don’t send mails again

Atish

-Original Message-
From: Angel L. Mateo [mailto:ama...@um.es] 
Sent: Monday, June 22, 2009 11:50 AM
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] How often does Nagios need restarting? (Quis
custodiet ipsos custodes?)

El vie, 19-06-2009 a las 15:35 -0600, Kustner, Tom escribió:
> I am a Nagios user, not the administrator.  We are running Nagios 2.9 on
> RHEL 4 or 5.  Overall, 200+ hosts with 3000 services being monitored. I
> have access for monitoring a smaller number of hosts.
> 
> In another posting, I alluded to an issue where a host had gone down but
> no alert was sent out.   The issue surfaced again today and as was done
> the other time, Nagios was restarted to "fix" the problem.   I am
> naturally concerned about the unreliability.  
> 
Maybe the problem was a previous problematic restart. In our
installation, nagios (nagios3, but previously nagios 1) is working
smoothly. We don't have any problem like you, but sometimes when we
restart it to reload configuration (if we have added hosts or services).
Sometimes, the restart fails and we have several instances of nagios
running and appears a similar problem. Killing all nagios processes and
starting again fix the problem.

We configure nagios with centreon and we don't know if this a
problem
related with centreon or with a startup scripts (centreon uses them).

But, if we don't reload nagios, it runs without problems.

-- 
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 868887590
Fax: 86337



--
Are you an open source citizen? Join us for the Open Source Bridge
conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference:
$250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.o
rg
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


--
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] How often does Nagios need restarting? (Quis custodiet ipsos custodes?)

2009-06-21 Thread Angel L. Mateo
El vie, 19-06-2009 a las 15:35 -0600, Kustner, Tom escribió:
> I am a Nagios user, not the administrator.  We are running Nagios 2.9 on
> RHEL 4 or 5.  Overall, 200+ hosts with 3000 services being monitored. I
> have access for monitoring a smaller number of hosts.
> 
> In another posting, I alluded to an issue where a host had gone down but
> no alert was sent out.   The issue surfaced again today and as was done
> the other time, Nagios was restarted to "fix" the problem.   I am
> naturally concerned about the unreliability.  
> 
Maybe the problem was a previous problematic restart. In our
installation, nagios (nagios3, but previously nagios 1) is working
smoothly. We don't have any problem like you, but sometimes when we
restart it to reload configuration (if we have added hosts or services).
Sometimes, the restart fails and we have several instances of nagios
running and appears a similar problem. Killing all nagios processes and
starting again fix the problem.

We configure nagios with centreon and we don't know if this a problem
related with centreon or with a startup scripts (centreon uses them).

But, if we don't reload nagios, it runs without problems.

-- 
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información   _o)
y las Comunicaciones Aplicadas (ATICA)  / \\
http://www.um.es/atica_(___V
Tfo: 868887590
Fax: 86337


--
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] How often does Nagios need restarting? (Quis custodiet ipsos custodes?)

2009-06-19 Thread Marc Powell

On Jun 19, 2009, at 4:35 PM, Kustner, Tom wrote:

> I am a Nagios user, not the administrator.  We are running Nagios  
> 2.9 on
> RHEL 4 or 5.  Overall, 200+ hosts with 3000 services being  
> monitored. I
> have access for monitoring a smaller number of hosts.
>
> In another posting, I alluded to an issue where a host had gone down  
> but
> no alert was sent out.   The issue surfaced again today and as was  
> done
> the other time, Nagios was restarted to "fix" the problem.   I am
> naturally concerned about the unreliability.
>
> Any thoughts on this problem?Specifically:

I am not aware of any problems such as you describe that a restart  
would 'fix'.

> * What are best practices for making sure Nagios does not fall down on
> the job?   Is there something not set right?

See my previous e-mail.

> * Are other Nagios administrators restarting Nagios on a weekly or
> nightly basis to keep it on the job?

I reload hourly to pick up new config changes but rarely restart.

> * Is this an issue specific to Nagios 2.9?  Was 2.9 a spotty version?

I'm not sure you even have an issue outside of a possible  
configuration problem. No, 2.9 was not spotty or bad.

> * For a given host, why would "active checks" be enabled, yet "N/A"
> appears in the "Next Active Check" field?

This is normal. See my previous e-mail.

--
Marc


--
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] How often does Nagios need restarting? (Quis custodiet ipsos custodes?)

2009-06-19 Thread Kustner, Tom
I am a Nagios user, not the administrator.  We are running Nagios 2.9 on
RHEL 4 or 5.  Overall, 200+ hosts with 3000 services being monitored. I
have access for monitoring a smaller number of hosts.

In another posting, I alluded to an issue where a host had gone down but
no alert was sent out.   The issue surfaced again today and as was done
the other time, Nagios was restarted to "fix" the problem.   I am
naturally concerned about the unreliability.  

Any thoughts on this problem?Specifically:

* What are best practices for making sure Nagios does not fall down on
the job?   Is there something not set right?

* Are other Nagios administrators restarting Nagios on a weekly or
nightly basis to keep it on the job?

* Is this an issue specific to Nagios 2.9?  Was 2.9 a spotty version?

* For a given host, why would "active checks" be enabled, yet "N/A"
appears in the "Next Active Check" field?

Thanks for any help.

-Tom Kustner-


The information contained in this message and any accompanying attachments may 
contain privileged, private and/or confidential information protected by state 
and federal law.  Penalties may be assessed for unauthorized use and/or 
disclosure.  This message and any attachments are intended for the designated 
recipient only.  If you have received this information in error, please notify 
the sender immediately and return or destroy the information.

This e-mail transmission and any attachments are believed to have been sent 
free of any virus or other defect that might affect any computer system into 
which it is received and opened. It is, however, the recipient's responsibility 
to ensure that the e-mail transmission and any attachments are virus free, and 
the sender accepts no responsibility for any damage that may in any way arise 
from their use.

--
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null