Re: [Nagios-users] Splunk Integration Question...

2013-09-10 Thread Frost, Mark {BIS}
Huh.   Where did those new options come from?  They weren't in the cgi.cfg docs 
the last time I looked :).

I agree, it's not terribly clear to me what that option does, but it does 
reference "Splunk IT" which is a special Splunk package that you can use for 
Splunk benchmarking.   That still doesn't make it clear what it's used for.

I see a second parameter, "splunk_url" that lets you specify the URL for your 
Splunk server.

Maybe it just somehow says to pepper the logs with your Splunk URL in 
appropriate places.


From: Sean Alderman []
Sent: Tuesday, September 10, 2013 1:34 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Splunk Integration Question...

Just what's in the nagios doc on CGI.cfg. The doc is lacking about what it 
does, so I guess I'm a little curious what that config is about.

- Sean Alderman
Senior Engineer, UDit Systems Integration

This message had been brought to you by Android Bionic.
On Sep 10, 2013 1:10 PM, "Frost, Mark {BIS}">> wrote:

Can you describe what you're doing for Splunk integration with Nagios?   I've 
used Splunk with Nagios in a couple different ways, but I'm not aware of any 
single standard for doing so.

Originally, I just had Splunk run a scheduled search, which would trigger a 
script which sent a passive check result back to a Nagios service via NSCA.   
That way - having Nagios process passive check results from Splunk - was the 
only way I could see to do that.

Recently, I played around a bit with writing scripts that made use of Splunk's 
REST API so the checks could be run as active checks from Nagios.  (I always 
prefer active checks).   I set this up for only one check, but once I got it 
working it worked pretty well.

As a side note, I'm still a little on the fence about whether or not I really 
want to have Nagios find problems through Splunk and then alert on them or have 
Splunk find an alert on them directly without using Nagios at all...

Are you referring to another way of making Splunk and Nagios talk together?


From: Sean Alderman 
Sent: Monday, September 09, 2013 1:12 PM
Subject: [Nagios-users] Splunk Integration Question...

  I was hoping I might find someone who's got the splunk integration actively 
working.  I'm running Nagios Core (via EPEL) and Splunk 5.0.3 on OracleLinux 
   When I edit cgi.cfg and enable splunk integration, then set the splunk URL 
 I notice the nagios URLs look like: 
https://:8000/en-US/app/flashtimeline?<>%20.  I have two questions...
* Is there a way I can make nagios use the hostname only, not the FQDN? 
 We use short names in splunk so we don't a mix of fqdn and short names since 
we use both forwarders and syslog as input.
* What data is this query looking for, is it expected that I should 
have my nagios log in splunk?  The  in the query doesn't 
seem useful to me, unless there's splunk data specifically tied to that check, 
and I'm hoping someone could provide an example.
Kind regards,
Sean M. Alderman
Senior Engineer, UDit Systems Integration and Engineering
University of Dayton

How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
Nagios-users mailing list<>
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Splunk Integration Question...

2013-09-10 Thread Frost, Mark {BIS}

Can you describe what you're doing for Splunk integration with Nagios?   I've 
used Splunk with Nagios in a couple different ways, but I'm not aware of any 
single standard for doing so.

Originally, I just had Splunk run a scheduled search, which would trigger a 
script which sent a passive check result back to a Nagios service via NSCA.   
That way - having Nagios process passive check results from Splunk - was the 
only way I could see to do that.

Recently, I played around a bit with writing scripts that made use of Splunk's 
REST API so the checks could be run as active checks from Nagios.  (I always 
prefer active checks).   I set this up for only one check, but once I got it 
working it worked pretty well.

As a side note, I'm still a little on the fence about whether or not I really 
want to have Nagios find problems through Splunk and then alert on them or have 
Splunk find an alert on them directly without using Nagios at all...

Are you referring to another way of making Splunk and Nagios talk together?


From: Sean Alderman []
Sent: Monday, September 09, 2013 1:12 PM
Subject: [Nagios-users] Splunk Integration Question...

  I was hoping I might find someone who's got the splunk integration actively 
working.  I'm running Nagios Core (via EPEL) and Splunk 5.0.3 on OracleLinux 
   When I edit cgi.cfg and enable splunk integration, then set the splunk URL 
 I notice the nagios URLs look like: 
https://:8000/en-US/app/flashtimeline?  I have two questions...
* Is there a way I can make nagios use the hostname only, not the FQDN? 
 We use short names in splunk so we don't a mix of fqdn and short names since 
we use both forwarders and syslog as input.
* What data is this query looking for, is it expected that I should 
have my nagios log in splunk?  The  in the query doesn't 
seem useful to me, unless there's splunk data specifically tied to that check, 
and I'm hoping someone could provide an example.
Kind regards,
Sean M. Alderman
Senior Engineer, UDit Systems Integration and Engineering
University of Dayton
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Nagios 4 beta plan/status/roadmap?

2013-08-14 Thread Frost, Mark {BIS}
Unless I missed a message somewhere, Nagios 4 is still in beta.   Is there an 
expected time when it will become a regular release?   It's been out for a bit 
now.  Does it seem as if there will be a "beta2" or is this effectively the 
release candidate?

I have seen some new patches flowing in (mostly feature stuff, I think) and I 
wasn't sure if those were eventually to be included with Nagios 4 or not - 
possibly in a later release.

I've had good luck with the pre-beta versions of Nagios 4 and am contemplating 
timing for a move to Nagios 4 (and Merlin).



Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes.
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Monitor windows mapped drive

2013-06-11 Thread Frost, Mark {BIS}

I've only ever found two ways to do this.

1)  We were using NSClient++ in the same manner you were (running it with a 
domain user id that had permissions to access that network drive).I think 
we still though did not attempt to access it as the drive letter.   I'm pretty 
sure we used the UNC path to the disk.Unless I'm mistaken, the driver 
letter is only mapped after a user logs in and the service that NSClient++ does 
not replicate a login shell to trigger the drive to map to a drive letter.

2)  We are using check_disk_smb from the Nagios plugins package to mount 
the UNC path to the disk locally on our Linux machine and check available space.


From: Sunil Sankar []
Sent: Tuesday, June 11, 2013 9:11 AM
To: Nagios Users List
Subject: Re: [Nagios-users] Monitor windows mapped drive


Any update on this


On Sat, Jun 8, 2013 at 8:26 PM, Sunil Sankar>> wrote:
Has anyone monitored mapped drive in windows using nsclient .I am not able to 
do it need you help
I have mapped Z drive in windows and also I have started nsclient with same 
user as the mapped drive

When I execute the check I am getting the following output
[root@nagios4 ~]# /opt/nagios/libexec/check_nrpe -H -c 
CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90% FilterType=REMOTE 

  OK: All drives within bounds.
[root@nagios4 ~]#
[root@nagios4 ~]#  /opt/nagios/libexec/check_nt -H -p 12489 -v 
NSClient++ 0,4,1,101 2013-05-18
[root@nagios4 ~]#  /opt/nagios/libexec/check_nrpe -H -p 5666
I (0,4,1,101 2013-05-18) seem to be doing fine...
[root@nagios4 ~]#

[root@nagios4 ~]# /opt/nagios/libexec/check_nrpe -H -c 
CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90%
OK: C:\: 10.1G|'C:\ %'=51%;80;90 'C:\'=10.056G;15.725;17.691;0;19.656
[root@nagios4 ~]#

Sunil Sankar

Sunil Sankar
This email is sponsored by Windows:

Build for Windows Store.
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Help with CPU Check Thresholds

2013-04-02 Thread Frost, Mark {BIS}

From: nap []
Sent: Tuesday, April 02, 2013 2:48 AM
To: Nagios Users List
Subject: Re: [Nagios-users] Help with CPU Check Thresholds

On Mon, Apr 1, 2013 at 2:05 PM, Scott Wilkerson>> wrote:

It would be very common for a machine with multiple CPU's to have a
higher load.  On a 16 CPU machine, no processes would be waiting at all
with a load under 16.

Yes if all the load is consumed by CPUs, but it's quite rare. For example with 
a database the most time is spend on disks I/O, and a 16cpu server won't help 
here, you can be overload with a load average > 1 in this case.
The "good" load average is very specific for each server/application. CPU 
number is a just a part of the "load" equation.


I think that's rather a misunderstanding of what load average is.   It is not 
based on the CPU (otherwise it would be called something stupid like CPU % on 
Windows).  There's lots of resources on the Googles that explain what goes into 
the load average calculation so I won't go into it here, but it is intended to 
be a general number indicating the load of the system and i/o wait times are 
definitely a factor in that calculation.   It is intended to be a 
representation of how busy a machine is overall.   The general rule is as Scott 
indicates that for a machine with 'n' CPUs (cores, kind of as well) if the load 
average is also 'n' then that machine is 100% utilized.  That is, it is 
completely able to keep up with what it's being asked to do.   If the load 
average is '2n' then it's 200% utilized and the machine is effectively getting 

But anyway, there are some really good write-ups as to how the load average 
calculation is made, but it's definitely not just CPU.CPU utilization by 
itself is pretty useless nowadays in my opinion.

Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13.
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios not working except for local host

2013-01-08 Thread Frost, Mark {BIS}

Remember that the UI shows you hosts and services based on the user you're 
logged into the UI as (or the user that apache thinks you are and then passes 
that on to the Nagios CGIs).   The hosts/services shown to a userid in the UI 
are based on either what Nagios thinks that user is a contact for (i.e. that id 
is a contact for a given host or service therefore you see it in the UI) or 
based on what settings you have for that user in your cgi.cfg file.

So if you're logged in as "martin" in the UI and "martin" doesn't match a 
contact in Nagios, then you would want to change your cgi.cfg to give "martin" 
permissions to say view all hosts, etc.


-Original Message-
From: Martin Hugo [] 
Sent: Tuesday, January 08, 2013 8:30 AM
To: 'Nagios Users List'
Subject: [Nagios-users] Nagios not working except for local host


My Nagios server died and I am having to rebuild (ran well for the best part of 
two years).  I am running Nagios Core 3.4.3 on Ubuntu 12.04 and have added 
pnp4nagios 0.6.19.  Nagios and pnp4nagios were installed from source. I believe 
I have configured it correctly but I have obviously missed something.

My Nagios pre-flight check shows no errors (one warning about a host with no 
services assigned - which I am aware of), it correctly lists the number of 
hosts and services but, when I go to the web interface, it only shows the local 

Can anyone suggest where I might start troubleshooting this?

Thanks very much.

Martin T. Hugo
Network Administrator
Hilliard City Schools
614-921-7102 (Ph)
614-921-7243 (Fax)

Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] solutions for off-server PNP4Nagios perfdata processing?

2012-10-03 Thread Frost, Mark {BIS}

As I understand it, the issue is less about Nagios and more about npcd.   
Nagios merrily produces the perfdata files and then npcd comes along and scoops 
them up, but as it's processing them it's opening a lot of rrd files and 
inserting data into them.   So really it's npcd that's the problem.  Well, not 
really a problem, but ultimately it's doing its thing and then Nagios gets less 
than a fair share of the box's I/O.   It's not that it's horrible right now, 
but we're starting to notice it and I would tend to be concerned about scaling 

Honestly even with Nagios 3, it seems like Nagios' own I/O is entirely 
manageable so far with strategic use of ram disk.   It's just putting Nagios 
and PNP4Nagios (plus Apache to serve up the graph contents which I'm also not 
happen going on on the same server) on the same boxes that I don't like.

Hmm.  I was unaware that rrdcached could be configured to receive data over the 
network.   I'm assuming that means that npcd can be configured to send.   I'll 
check that out.  Still doesn't feel like an elegant solution, but it may fit 
the bill.



From: Daniel Wittenberg []
Sent: Wednesday, October 03, 2012 11:08 AM
To: Nagios Users List
Subject: Re: [Nagios-users] solutions for off-server PNP4Nagios perfdata 

You might consider looking at 4.0 since disk i/o is almost nothing, but short 
of that looked at using rrdcache to send the processing to another server?


On Oct 3, 2012, at 9:33 AM, Frost, Mark {BIS} wrote:

Hello.  Has anyone come up with solutions for processing Nagios performance 
data on a server other than a Nagios server?   We've been processing perfdata 
results on our Nagios server(s) for a while now and increasingly it's just 
eating up too much I/O to make me comfortable.

Yes, we do use rrdcached and yes, I realize that shuffling data around on 
different disk spindles and controllers would help, but in today's world where 
companies don't like building any kind of physical server let alone one with 
all that additional hardware, that's not entirely an option for us.

I realize that once the perfdata files are on the dedicated graphing server(s), 
processing them into RRD files there should be a no-brainer.  My problem is 
figuring out how to get them there without say, using a NAS device.   (If I/O's 
a problem locally, I don't want to shuffle that I/O to an even slower network 

It would be ideal if somehow there was a process that I could just send that 
data to and have it picked up remotely.  Like if maybe Merlin have a special 
kind of peer that just received a stream of perfdata or something.  Anything 
else I could imagine would be some kind of home-grown solution like say pumping 
events into a messaging system from the Nagios server(s) and then letting the 
graphing server pick them up from the message queue(s).  I could also imagine 
some kind of fancy-pants module in Nagios 4 that did something like this, maybe.

Any thoughts would be appreciated.



Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
Nagios-users mailing list<>
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null

Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] solutions for off-server PNP4Nagios perfdata processing?

2012-10-03 Thread Frost, Mark {BIS}

My concern is more about the actual I/O to the RRD files and not so much 
processing the to-be-processed perfdata files (i.e. temporary files).   The 
heavy I/O is happening on the RRD filesystem and since I would of course need 
the RRD files to persist, I would not want to store them on a ram disk.  Plus 
it would need to be a fairly large ram disk to hold all the rrd files even if I 
were willing to lose them all if a reboot occurred.

We do use ram disks for Nagios status.dat files and spool files (i.e. things I 
can afford to lose in a reboot/crash) and it’s definitely been a good thing.   
It still seems weird to have to do so much “compensating” for Nagios normal 
operations for a moderately large installation (not really even huge) to make 
it work well.   I’m guessing again that this is going to be vastly improved 
with Nagios 4 as well.  At least no spool files.



From: davor grgicevic []
Sent: Wednesday, October 03, 2012 10:45 AM
To: Nagios Users List
Subject: Re: [Nagios-users] solutions for off-server PNP4Nagios perfdata 

Hi  Mark ...

did  you  try  a  using a ram  disk

On Wed, Oct 3, 2012 at 4:33 PM, Frost, Mark {BIS}>> wrote:
Hello.  Has anyone come up with solutions for processing Nagios performance 
data on a server other than a Nagios server?   We’ve been processing perfdata 
results on our Nagios server(s) for a while now and increasingly it’s just 
eating up too much I/O to make me comfortable.

Yes, we do use rrdcached and yes, I realize that shuffling data around on 
different disk spindles and controllers would help, but in today’s world where 
companies don’t like building any kind of physical server let alone one with 
all that additional hardware, that’s not entirely an option for us.

I realize that once the perfdata files are on the dedicated graphing server(s), 
processing them into RRD files there should be a no-brainer.  My problem is 
figuring out how to get them there without say, using a NAS device.   (If I/O’s 
a problem locally, I don’t want to shuffle that I/O to an even slower network 

It would be ideal if somehow there was a process that I could just send that 
data to and have it picked up remotely.  Like if maybe Merlin have a special 
kind of peer that just received a stream of perfdata or something.  Anything 
else I could imagine would be some kind of home-grown solution like say pumping 
events into a messaging system from the Nagios server(s) and then letting the 
graphing server pick them up from the message queue(s).  I could also imagine 
some kind of fancy-pants module in Nagios 4 that did something like this, maybe.

Any thoughts would be appreciated.



Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
Nagios-users mailing list<>
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null

Davor Grgicevic
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] solutions for off-server PNP4Nagios perfdata processing?

2012-10-03 Thread Frost, Mark {BIS}
Hello.  Has anyone come up with solutions for processing Nagios performance 
data on a server other than a Nagios server?   We've been processing perfdata 
results on our Nagios server(s) for a while now and increasingly it's just 
eating up too much I/O to make me comfortable.

Yes, we do use rrdcached and yes, I realize that shuffling data around on 
different disk spindles and controllers would help, but in today's world where 
companies don't like building any kind of physical server let alone one with 
all that additional hardware, that's not entirely an option for us.

I realize that once the perfdata files are on the dedicated graphing server(s), 
processing them into RRD files there should be a no-brainer.  My problem is 
figuring out how to get them there without say, using a NAS device.   (If I/O's 
a problem locally, I don't want to shuffle that I/O to an even slower network 

It would be ideal if somehow there was a process that I could just send that 
data to and have it picked up remotely.  Like if maybe Merlin have a special 
kind of peer that just received a stream of perfdata or something.  Anything 
else I could imagine would be some kind of home-grown solution like say pumping 
events into a messaging system from the Nagios server(s) and then letting the 
graphing server pick them up from the message queue(s).  I could also imagine 
some kind of fancy-pants module in Nagios 4 that did something like this, maybe.

Any thoughts would be appreciated.



Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Using "contacts" in host definition -- Bug ?

2012-09-27 Thread Frost, Mark {BIS}

> -Original Message-
> From: Andreas Ericsson [] 
> Sent: Thursday, September 27, 2012 3:08 PM
> To: Nagios Users List
> Cc: Frost, Mark {BIS}
> Subject: Re: [Nagios-users] Using "contacts" in host definition -- Bug ?

> On 09/26/2012 09:25 AM, Frost, Mark {BIS} wrote:
>> I believe this is a "feature" introduced in 3.3 or thereabouts.  I've
>> always found it very irritating and wish there was some way to turn
>> off inheriting host contacts/contactgroups to services as it's never
>> what I want.

> It's a half misfeature. The intention was (and is) that services with no
> contactgroups OR contacts should inherit the ones from the host, but it
> was coded up so that if the service had contactgroups (but not contacts)
> it would inherit contacts (but not contactgroups) from the host. The same
> applied when the service had contacts but no contactgroups and the host
> had contactgroups.

> In 4.0 this is fixed so only services with neither contacts nor contact-
> groups inherit them from the host.

> -- 
> Andreas Ericsson

I still see that as a misfeature.   I would rather have the preflight
check tell me there's an error because I forgot to define contacts
for the service than for it to assume that what I want is anything to
be inherited from a host definition.   I'd be OK if that were the default
behavior and was configurable, but I'd be the first to disable that
in nagios.cfg if it were a configurable preference.  Yeah, I know,
patches gleefully accepted :-).


Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:;258768047;13503038;j?
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Using "contacts" in host definition -- Bug ?

2012-09-26 Thread Frost, Mark {BIS}
I believe this is a "feature" introduced in 3.3 or thereabouts.  I've
always found it very irritating and wish there was some way to turn
off inheriting host contacts/contactgroups to services as it's never
what I want.


-Original Message-
From: Schimpke, Dr. Thomas - bhn [] 
Sent: Wednesday, September 26, 2012 9:58 AM
Subject: [Nagios-users] Using "contacts" in host definition -- Bug ?


I noticed some strange behaviour if I add a contact to a host definition
in Nagios 3.3.2 (or so) and 3.4.1. It seems, that if I add a contact to
a host, this contact will be automatically added to all services on this
host also - I've checked this using the objects cache file. This kind of
inheritance does not seem to happen with normal contact groups.

>From the documentation:  "This is a list of the short names of the
contacts that should be notified whenever there are problems (or
recoveries) with this host..." 

One may read this like: ...the list of contacts is notified for each and
every problem (even in the services) associated with the host. 

But for the contact_groups: "This is a list of the short names of the
contact groups that should be notified whenever there are problems (or
recoveries) with this host." 

And there this kind of inheritance does not take place. So at least, the
behaviour is inconsistent - but I suspect this is a bug. You may want to
have a phone call in the middle of the night, because your server went
downBut I think you don't want to have a call for *all* services on
the host...



Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Any upcoming release?

2012-05-09 Thread Frost, Mark {BIS}
> -Original Message-
> From: Andreas Ericsson [] 
> Sent: Wednesday, May 09, 2012 3:53 PM
> To: Nagios Users List
> Cc: Frost, Mark {BIS}
> Subject: Re: [Nagios-users] Any upcoming release?

> On 05/09/2012 07:45 PM, Frost, Mark {BIS} wrote:

>> Andreas,
>> I'm a little confused about this.   I've been eagerly awaiting these
>> gee-wiz-bang space-age changes, but when I looked over the change list
>> for 3.4.0 that Ethan sent they seem like mostly minor changes.  Or
>> perhaps they just don't describe things in enough detail to match up the
>> rather significant architectural changes listed above.
>> Is this the super-summarized bullet item that refers to the change above?
>>  * Use execv() to execute active check commands (#86 - Ton Voon, 
>> dnsmichi)

> Nopes, it's not, and that patch is actually broken. My code still leaks (about
> 1MB per 24 hours with 1000 checks / second), so I've held it back a bit. I
> didn't know they were going to hit the release button so fast, and without a
> beta period.

> -- 
> Andreas Ericsson

Aha!  Thanks.  Yeah, I was little struck by how this seemed more like a 3.3.2
release than a release that indicated something significant going on under
the covers.

I guess there's now the problem of when the i/o broker is ready, is that
a 3.4.1 release?   Seems kind of major for that.


Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Any upcoming release?

2012-05-09 Thread Frost, Mark {BIS}

> -Original Message-
> From: Andreas Ericsson [] 
> Sent: Thursday, April 05, 2012 4:34 AM
> To: Nagios Users List
> Subject: Re: [Nagios-users] Any upcoming release?

> On 04/05/2012 07:31 AM, Yu Watanabe wrote:
>> Hi all!
>> I would like to know if there are any plans for the nagios v 3.4.x.
>> It has been a while since the last release so I was very curious about it.

> There is. Nagios 3.4 will be a single-threaded and event-driven application
> that sports an I/O-broker and vastly improved check performance. In essence,
> we've removed 2 fork() calls, 4 disk searches, 2 filewrites and 2 filereads
> from each check being performed. There's also a fixed usage of the current
> scheduling queue implementation which turns scheduling new checks from its
> current O(n) behaviour to O(1). This will provide a huge benefit for large
> installations, and combined with the worker process code we're currently
> seeing a 12-fold increase in the amount of checks Nagios can execute, but
> it's still too early to tell what other things are affected. The external
> command pipe might be a bottleneck if one uses large amounts of passive
> checks, for example.

> It's currently in late alpha, so beta releases should be available in a
> month or so.

> -- 
> Andreas Ericsson


I'm a little confused about this.   I've been eagerly awaiting these
gee-wiz-bang space-age changes, but when I looked over the change list
for 3.4.0 that Ethan sent they seem like mostly minor changes.  Or 
perhaps they just don't describe things in enough detail to match up the
rather significant architectural changes listed above.

Is this the super-summarized bullet item that refers to the change above?

* Use execv() to execute active check commands (#86 - Ton Voon, 



Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] How can I change Nagios from email address ?

2012-05-08 Thread Frost, Mark {BIS}
Depending on how you send your mail messages, you could just get away with 
using a command-line argument.

In our case, the notifications commands send mail using "Mail" (or "mailx" -- I 
can't remember of the top of my head).   So we've modified the e-mail 
notifications commands in the Nagios config to add


Assuming you use Mail or mailx, check the man pages for those on your local OS 
to ensure it supports them, but I thought most modern Linuxes supported that.

Note that this changes the Reply-to line that your messages comes from.  That's 
been more than sufficient for us and makes Nagios messages appear the way we 
want them to.


From: özgür umut vurgun []
Sent: Tuesday, May 08, 2012 3:29 AM
Subject: [Nagios-users] How can I change Nagios from email address ?

Hi All,

I'd like to change Nagios Email address. Now I am using 
"admin@hostname-nagios.localhost" but many system doesn't accept this email 
address. So I'd like to change to real e-mail address. I have searched in the 
internet but I couldn't be success.

How can I do it ?


Özgür Umut VURGUN

Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] "Ackowledge this service" link is missing

2012-03-14 Thread Frost, Mark {BIS}

I don't know that this is it, but you can only acknowledge something that is in 
a hard critical state.   That is, it has to have hit the threshold of failures 
before it is considered in a hard state.   The service details should tell you 
if it is, but I've had times where I've thought the same thing only to realize 
that the service hadn't gotten all the way to a hard state just yet.


From: Andrew Thompson []
Sent: Wednesday, March 14, 2012 8:44 AM
Subject: [Nagios-users] "Ackowledge this service" link is missing

Hi all

Nagios 3.3.1 on Ubuntu 11.04 Desktop.

I have just come to acknowledge a critical service and much to my surprise I 
don’t have the option too.

Is this a known bug? Any help appreciated.

Kind Regards

T: 01386 834000
F: 01386 834100

Fulgent Technologies Limited, Haddonsacre, Station Road, Offenham, Evesham, 
WR11 8JJ. This communication contains information which is confidential and may 
also be privileged or protected by copyright. It is for the exclusive use of 
the addressee. If you are not the addressee please note that any distribution, 
reproduction, copying, publication or use of this communication or the 
information is prohibited. If you have received this communication in error, 
please telephone us immediately and also delete the communication from your 

Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Transient errors

2012-03-01 Thread Frost, Mark {BIS}

I'm afraid I don't have a simple answer for you there.   It sounds like you're
monitoring some things that are far away network-wise.   If this were my
environment I would try to setup a distributed Nagios installation with
locally situated Nagios servers to monitor services that were local. You
could either use Merlin and a poller/NOC setup or possibly something
like "Multi-Site" to allow you to see all the different locations from one
central location.

If you're talking about things like ping for host checks (or better yet
'fping'), then you should be able to adjust the threshold upwards to allow
for longer and longer round-trip times.

Otherwise, I would say that the approaches you mention are what you
generally have to wind up doing.  For instance, we try to have standard
thresholds for CPU alerting on Windows servers, but we have some
reporting servers that can peg the CPU for 30 minutes.  So the teams
who own those servers have asked us to raise the threshold for hard
criticals to 45 consecutive failures (roughly 45 minutes with the way we
schedule checks to run).

So you kind of have to take each check on a case by case basis.
Usually because you saw the failures and dug into what the exact issue
was and determined what, if anything, a resolution for that was.

One other example.   We do some checks of Oracle databases and the way
the Oracle client libraries work, if a database is down, the library could
make the code wait for 5 minutes before it returns anything.   Obviously
that's sort of a problem for Nagios in terms of scheduling, latency, and
check execution times.So the solution was to modify the code itself
to have a timeout that kill the Oracle connection attempt and abort the
check script.

As a related thing from that last example, if you're using check_nrpe and
you're getting timeouts, you *could* increase the timeout value, but
again, that has implications to your Nagios server's instance if those run
too long.  You usually want checks executing quickly.


-Original Message-
From: David Dyer-Bennet [] 
Sent: Thursday, March 01, 2012 4:38 PM
Subject: [Nagios-users] Transient errors

I see a lot of transient errors on services and hosts I'm monitoring. 
Hence finding ways to keep notifications from going out on situations that
will resolve themselves are kind of an issue.

I've played with how many failures in a row are needed to cause a
notification, and have that set differently for things I'm monitoring
across long links (Beijing, say) compared to things I'm monitoring locally
or in New York.  Of course, one problem with that is that it makes it take
longer before a real problem causes a notification.  Right now it takes
over 15 minutes for the total failure of our link to Beijing to cause a

For things that are numeric values, I can play with the critical and
warning ranges to potentially reduce false positives.  That, at least,
doesn't slow down recognition of total failures.   Some things just don't
seem to fit the Nagios model -- for example it's quite normal for the SQL
server to pull 100% of the cpu for periods now and then, but if it goes on
too long, *that's* unusual.  Hmm; I suppose I could override the number of
failures needed to cause a notification in the service definition for
htose, couldn't I? There may be some things I should just stop monitoring
(there aren't clear-cut "okay" and "bad" behaviors that I can quantify).

I guess I'm wondering if there are useful basic approaches to handling
this problem that I'm missing, or if I just need to work through the
details more carefully.   I'm startled at how often I get isolated
failures for no apparent reason.  Is that normal for most people
monitoring services?  I think I'm finding my connections time out now and
then due simply to load, without the load actually being at all high.
David Dyer-Bennet,;

Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing comp

Re: [Nagios-users] Have we reached some kind of Nagios limit?

2012-02-20 Thread Frost, Mark {BIS}
Thanks, Sven.  I'm almost certain you're correct.  For some reason I had 
thought that when I turned on large_installation_tweaks some time ago that 
environment variables were turned off.   However, now I see that it only turns 
off summary macros.   Not sure how I misinterpreted that.

So it would make sense that in the case of this particular collection of hosts 
and services that Nagios was probably creating such a large set of environment 
variables that it was perhaps overriding a shell limit and preventing the 
exec'd check from properly executing.  I did some preliminary tests and turning 
that off cleared things up.  And of course, in addition to fixing this issue, I 
believe I'm going to get a performance boost (or at least a resource usage 
drop) as an added bonus.

Unfortunately the only place I do currently use environment variables is with 
several event scripts.   Changing those scripts to use command-line variables 
is proving to be rather a pain in the butt given how many variables I have them 
check.   But I'm getting there.

Thanks very much for your help!


-Original Message-
From: Sven Nierlein [] 
Sent: Saturday, February 18, 2012 2:05 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Have we reached some kind of Nagios limit?

On 2/18/12 18:48, Frost, Mark {BIS} wrote:
> ...
> I added maybe 5 of these new hosts, ran the pre-flight check and restarted.  
> After the restart I started noticing that our failing service checks (for all 
> services) went from around 260 to over 4K.  All of those new failing checks 
> were only on hosts of this same type (that particular application on Windows 
> servers I mentioned above which is also what these new hosts were part of) 
> and they were all reporting the same failure condition:
> (Return code of 127 is out of bounds - plugin may be missing)
> ...
> What can I do?

Disable environment macros. You hit the limit of maximum length of a new shell 
command which can be pretty huge when using env macros.
Its strongly advised to turn them off when using mklivestatus anyway.


Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
Nagios-users mailing list
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Have we reached some kind of Nagios limit?

2012-02-18 Thread Frost, Mark {BIS}
A couple of days ago, I ran into a problem I've never seen before.  We run a 
single large instance with mostly very heterogeneous checks and host types.  
One particular group of Windows hosts, however, are all quite similar and they, 
like most of our other checks rely on the use of templates.  I needed to add 10 
more hosts of this particular type and typically all I have to do is just 
define the hosts and the service checks happen automatically as the host 
templates include them in a group that includes all the relevant checks.

I added maybe 5 of these new hosts, ran the pre-flight check and restarted.  
After the restart I started noticing that our failing service checks (for all 
services) went from around 260 to over 4K.  All of those new failing checks 
were only on hosts of this same type (that particular application on Windows 
servers I mentioned above which is also what these new hosts were part of) and 
they were all reporting the same failure condition:

(Return code of 127 is out of bounds - plugin may be missing)

Now ordinarily this would indicate a client-side issue, but there isn't one.  I 
can validate that by running check_nrpe manually against any of these hosts.   
I could imagine a typo that would cause this, particular against other existing 
hosts that had not been touched, but I double-checked and did not find one (I 
was just adding host definitions to this group - nothing else).

I cloned this environment and went to play with it in a non-production instance 
that was identical to the production Nagios instance except for a slight newer 
version of Merlin in the backend (1.1.14 for the non-prod instance, 1.1.13 
something for the production one), but both used the same Nagios 3.3.1 + 
downtime locking patches.   I was able to reproduce the situation and after a 
couple of days of trial and error I've still not been able to completely 
isolate the issue, but I've determined that

-   it's not got anything to do with the mk-livestatus module (turned it 
off, turned it back on), but it's been very helpful in figuring out which of 
the 13K+ services and 1200+ hosts are impacted
-   it doesn't seem to be about adding random hosts and services.   I can 
add others and this doesn't happen
-   the host definition uses a template that puts the host in a hostgroup.  
Those hostgroups are then used to in service definitions (12-15 services, 
depending on which group).   I had thought that perhaps if the hostgroup_name 
line of the service definition expanded to too many hosts that could be the 
problem.  I broke the service definitions down into 2 definitions, one for each 
production hostgroup rather than combining them and that didn't matter.
-   the service templates that the service definitions use for these hosts 
all add them to a common servicegroup.  My current line of thinking leads me to 
believe it's got something to do with this.   With a particular test scenario I 
created where I create a new host, but exclude it from the hostgroup 
definitions and instead manually create service definitions for this host (I 
know this "one more host" is right on the cusp of this problem), I find that 
when I add it so the 4,331st service gets added to the servicegroup, the 
problem starts.  If I remove that from that host's service definition all the 
other hosts' services recover.   However, based on this thinking, if I just 
comment out the servicegroup add from the service template these hosts use, the 
problem should stop - it doesn't.
-   the only affect services are on all of the hostgroup I'm changing.   
Other unrelated hosts and services are unaffected.   There are 3 hostgroups: 
Production Appname Hosts 1, Production Appname Hosts 2, and All Appname Hosts 
which is obviously a combination of the two.   All Appname Hosts is around 324 

I'm not really sure what to try at this point.  It does seem like I've hit some 
kind of internal limitation with Nagios, but I don't know how to determine 
anything else about it beyond this.  If I were able to completely isolate this 
to say, not adding anything to a single servicegroup, I could avoid that and 
continue adding hosts as we need it, but I have so far not been able to find 
such a workaround.   If there is a limitation like this, it would of course, be 
nice for the pre-flight check to tell me that I can't have more than X members 
of a servicegroup or something.

Other info:

Nagios version: Nagios 3.3.1 with locking patches
Merlin backend: 1.1.13+ (production), 1.1.14 (test)
MK-Livestatus module 1.1.12p6 installed (uninstalled doesn't impact)
OS: SLES 11.1 Linux, 64-bit
Memory: 12GB
CPU: 2x 2.4Ghz quad-core Xeon

What can I do?



Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a serv