Re: [Nagios-users] problem with newly created checkcommand and service

2011-11-22 Thread Claudio Kuenzler
Hi,

Please show the command definition of database_connection-time.

Furthermore in the service definition you use the following line:

check_command   database_connection-time!

Didn't you want to pass arguments to the command? If they're already
hardcoded in the command definition you can leave the exclamation mark off.

On Tue, Nov 22, 2011 at 7:29 PM, Kaplan, Andrew H. wrote:

> **
>
> Hi there --
>
> I am going through the motions of adding a new checkcommand, and service
> to the Nagios server. The command involves
> the check_mssql_health plugin which runs on the Nagios server. The plugin
> gets in information via queries to a particular
>
> port on the Microsoft SQL server. Here are its particulars:
>
> */usr/local/nagios/libexec/check_mssql_health --server=
> --username= --password= --port=
> --mode=connection-time*
>
> The name of the checkcommand is:* database_connection-time
> *
> Once the checkcommand was created, so was the service. The configuration
> of the service in question, taken from the
> services.cfg file, is shown below:
>
> define service {
> service_description Database Connection Time
> check_command   database_connection-time!
> host_name   
> check_period24x7
> contact_groups  nt-admins,linux-admins,admins
> event_handler_enabled   0
> active_checks_enabled   1
> passive_checks_enabled  0
> notifications_enabled   1
> check_freshness 0
> freshness_threshold 86400
> use generic-service
> }
>
> To verify the new configuration would work, the command:
>
> */usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg*
>
> was run to verify the configuration. It was here the error message:
>
>*Checking services...
>Error: Service check command '_connection-time' specified in
>service 'Database Connection Time' for host '' not defined
>anywhere!*
>
> I verified the syntax of the command in the checkcommands.cfg file,
> including the name given to the command. Why would Nagios
>
> think the service check command is not defined, and return this error?
>
>
>
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>
>
> --
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] problem with newly created checkcommand and service

2011-11-22 Thread Kaplan, Andrew H.
Hi there --

I am going through the motions of adding a new checkcommand, and service to the
Nagios server. The command involves 
the check_mssql_health plugin which runs on the Nagios server. The plugin gets
in information via queries to a particular 
port on the Microsoft SQL server. Here are its particulars: 

/usr/local/nagios/libexec/check_mssql_health --server=
--username= --password= --port= --mode=connection-time


The name of the checkcommand is: database_connection-time

Once the checkcommand was created, so was the service. The configuration of the
service in question, taken from the 
services.cfg file, is shown below:

define service {
service_description Database Connection Time
check_command   database_connection-time!
host_name   
check_period24x7
contact_groups  nt-admins,linux-admins,admins
event_handler_enabled   0
active_checks_enabled   1
passive_checks_enabled  0
notifications_enabled   1
check_freshness 0
freshness_threshold 86400
use generic-service
}

To verify the new configuration would work, the command: 

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 

was run to verify the configuration. It was here the error message: 

Checking services...
Error: Service check command '_connection-time' specified in
service 'Database Connection Time' for host '' not defined
anywhere!

I verified the syntax of the command in the checkcommands.cfg file, including
the name given to the command. Why would Nagios 
think the service check command is not defined, and return this error? 





The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] [PATCH] reduce notification load; fix $NOTIFICATIONRECIPIENTS$ macro #98

2011-11-22 Thread Michael Friedrich
On 22.11.2011 10:31, Andreas Ericsson wrote:
> On 11/22/2011 01:02 AM, Michael Friedrich wrote:
>> On 21.11.2011 20:56, Andreas Ericsson wrote:
>>> On 11/01/2011 02:05 PM, Michael Friedrich wrote:
 hi,

 recently we've been debugging on team icinga in the middle of
 notifications and macros, and while investigating on a users problem,
 we've digged a bit deeper into the notification viability checks,
 resulting in deeper analysis of an Opsview patch to reduce the
 notification load significantly by moving the viability checks from
 the actual notification into the creation of the contacts notified,
 passing only a list of 'qualified' contacts to the actual
 notification logic. the only thing to remark over here is that the
 checks against the valid notification_period now happen sooner, and
 not actually when the notification is sent to each contact.

 while implementing that patch into current code (needs some macro
 passing with current code), we did remember nagios bug #98 where the
 $NOTIFICATIONRECEIPIENTS$ macro is demanded to be only populated with
 the actual contacts to be notified, but not all of those assigned to
 the host/service. while this is considered to be a real bug, further
 investigation showed that thanks to the viability checks before
 calling add_notification(), contacts won't be added to that macro as
 the macro logic happens within that function too.

 so by applying the attached git patch, you will a. reduce
 notification load and b. fix the $NOTIFICATIONRECEIPIENTS$ macro
 holding all contacts, but not the viable contacts.

 since the code remains actually the same on icinga and nagios in this
 stage, the tests can be found at the icinga dev tracker as usual.
 https://dev.icinga.org/issues/1744
 https://dev.icinga.org/issues/2023

>>>
>>> I've started looking into this patch right now. It's good to get that
>>> issue (#98) fixed, but I fail to see any noticeable performance
>>> improvement. All contacts potentionally viable for being contacted are
>>> still looked at, but the difference with this patch is that it checks
>>> the viability before shipping it off to add_notification(), which does
>>> fix issue 98 but at the expense of quite a lot of code duplication.
>>
>> normally, all contacts would have been added to the notification_list in
>> memory, even those not actually passing the viability checks. but at
>> this stage of the code, nobody is aware of that so the list gets
>> populated either way by calling add_notification().
>>
>> /* add all individual contacts for this host */
>>   ^^^
>>
>> having that notification_list created, this remains fully linked in
>> memory. let's say, you have a bunch of some 1k contacts for that
>> service, and actually the alarm would hit only those in the nonworkhours
>> or workhours timeperiod and only on critical, for the ops team e.g.
>> so by looping through the notification_list, you will encounter *all*
>> contacts for that host, only the duplicates have been removed.
>>
>> /* notify each contact (duplicates have been removed) */
>>
>> then you'll fire up the actual notification with calling
>> notify_contact_of_host - and actually in there, the current core checks
>> the viability for the contact to be notified.
>>
>> you are right, if each contact gets notified 24x7 on all
>> notification_options, the algorithm stays the same. but if you happen to
>> have a lot of different contacts assigned to hosts and services, not
>> getting notified each time a notification is triggered, the overall
>> amount of looping through notification_list will be shorter and save
>> some cpu cycles, and probably on larger systems, a bit more than just
>> some as this means a reduction of the looping for each contact to be
>> checked to be notified on the actual end-of-the-line.
>>
>
> Right, but all contacts are still checked for viability, so the amount
> of looping is reduced once for all those who aren't viable, while the
> number of viability checks (which I presume is the expensive part) will
> remain the same.

from that point of view you are absolutely right. thanks for clarification.

>
>> furthermore, where do you get the idea of code duplication from? the
>> only changes made by this patch is actually moving the viability checks
>> and therefore passing an additional function parameter which makes the
>> diff a bit more bloated than it should be.
>>
>
> The fact that the patch introduces eight locations with identical code
> headed with "check now if contact can be notified".
>
> The proper way to do this would be to introduce create_recipient_list(),
> passing it all the variables it needs to produce a list of recipients
> that have duplicates removed *and* are viable for being contacted. If a
> lot of code still has to be duplicated (as in the patch), more helpers
> in the form of add_recipient_for_service(&mac,

Re: [Nagios-users] e-mail notifications not being sent

2011-11-22 Thread Claudio Kuenzler
Normally the mail binary takes the hostname into account.
Verify the following files if you correctly have set up the hostname:
/etc/hosts
/etc/HOSTNAME

You might also want to take a look at .mailrc if the hostname is still not
shown.

On Mon, Nov 21, 2011 at 10:17 PM, Kaplan, Andrew H.
wrote:

> **
> Hi there --
>
> I checked the nagios.log file, and the problem was due to the mail binary
> not being at the /bin folder. I created a symbolic link
> at that location to point to the /usr/bin/mail binary. Once that was done,
> notifications were sent to the recipient.
>
> As a follow-up, the address of the Nagios server is shown to be
> nagios@localdomain. I would like to change that to reflect the
> name of the server. What file(s) do I need to modify in order to make that
> happen?
>
> Thanks.
>
>  --
> *From:* Claudio Kuenzler [mailto:c...@claudiokuenzler.com]
> *Sent:* Monday, November 21, 2011 3:25 PM
> *To:* Nagios Users List
> *Subject:* Re: [Nagios-users] e-mail notifications not being sent
>
> Maybe you have to replace the mail program by something on your system,
> e.g. /usr/bin/mailx.
> That's always one of the first things I change in a new Nagios
> installation.
> What OS are you using? Try to install the required programs
> (mail/mailx...) if they can't be found in your system.
>
> On Mon, Nov 21, 2011 at 8:25 PM, Kaplan, Andrew H. 
> wrote:
>
>> **
>>
>> Hi there --
>>
>> I completed the installation of Nagios 3.3.1, and I am going through the
>> testing process. The server is able to successfully monitor
>>
>> our various clients, but e-mail notifications for critical conditions are
>> not being sent to the intended recipient. The e-mail server that
>>
>> is on the Nagios server is the Postfix message transfer agent.
>>
>> The troubleshooting steps that I have taken so far are the following:
>>
>> 1. I have been able to send a test e-mail from the Nagios server to the
>> intended recipient using two different mail commands. The test
>>
>> e-mail was done from the command line using the mail and mailx
>> commands using the command syntax:
>>
>> mail  < /etc/fstab
>> mailx -s "test"  < /etc/fstab
>>
>> 2. I checked the contacts.cfg file, and confirmed the intended recipient
>> is listed with the correct address.
>>
>> 3. I checked the nagios.log, and there were entries that were similar to
>> the following:
>>
>> [1321385615] Warning: Attempting to execute the command "/usr/bin/printf
>> "%b" "* Nagios 2.6 *\n\nNotification Type: PROBLEM\nHost: ...
>> Date/Time: Tue Nov 15 14:33:35 EST 2011\n" |
>>
>> /bin/mail -…" resulted in a return code of 127.  Make
>> sure the script or binary you are trying to execute actually exists…
>>
>> I did a search for the mail binary, and there was none at that location.
>> To correct the problem, I created a symbolic that pointed to the actual
>>
>> location of the mail binary which was located at the /usr/bin/ folder.
>>
>> 4. As far as I can tell, all hosts have e-mail notifications enabled on
>> them.
>>
>> What other steps do I need to take in order to get e-mail notifications
>> to work here?
>>
>> Thanks.
>>
>>
>>
>> The information in this e-mail is intended only for the person to whom it
>> is
>> addressed. If you believe this e-mail was sent to you in error and the
>> e-mail
>> contains patient information, please contact the Partners Compliance
>> HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you
>> in error
>> but does not contain patient information, please contact the sender and
>> properly
>> dispose of the e-mail.
>>
>>
>> --
>> All the data continuously generated in your IT infrastructure
>> contains a definitive record of customers, application performance,
>> security threats, fraudulent activity, and more. Splunk takes this
>> data and makes sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-novd2d
>> ___
>> Nagios-users mailing list
>> Nagios-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>> reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
>
>
> --
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any i

Re: [Nagios-users] [PATCH] reduce notification load; fix $NOTIFICATIONRECIPIENTS$ macro #98

2011-11-22 Thread Andreas Ericsson
On 11/22/2011 01:02 AM, Michael Friedrich wrote:
> On 21.11.2011 20:56, Andreas Ericsson wrote:
>> On 11/01/2011 02:05 PM, Michael Friedrich wrote:
>>> hi,
>>>
>>> recently we've been debugging on team icinga in the middle of
>>> notifications and macros, and while investigating on a users problem,
>>> we've digged a bit deeper into the notification viability checks,
>>> resulting in deeper analysis of an Opsview patch to reduce the
>>> notification load significantly by moving the viability checks from
>>> the actual notification into the creation of the contacts notified,
>>> passing only a list of 'qualified' contacts to the actual
>>> notification logic. the only thing to remark over here is that the
>>> checks against the valid notification_period now happen sooner, and
>>> not actually when the notification is sent to each contact.
>>>
>>> while implementing that patch into current code (needs some macro
>>> passing with current code), we did remember nagios bug #98 where the
>>> $NOTIFICATIONRECEIPIENTS$ macro is demanded to be only populated with
>>> the actual contacts to be notified, but not all of those assigned to
>>> the host/service. while this is considered to be a real bug, further
>>> investigation showed that thanks to the viability checks before
>>> calling add_notification(), contacts won't be added to that macro as
>>> the macro logic happens within that function too.
>>>
>>> so by applying the attached git patch, you will a. reduce
>>> notification load and b. fix the $NOTIFICATIONRECEIPIENTS$ macro
>>> holding all contacts, but not the viable contacts.
>>>
>>> since the code remains actually the same on icinga and nagios in this
>>> stage, the tests can be found at the icinga dev tracker as usual.
>>> https://dev.icinga.org/issues/1744
>>> https://dev.icinga.org/issues/2023
>>>
>>
>> I've started looking into this patch right now. It's good to get that
>> issue (#98) fixed, but I fail to see any noticeable performance
>> improvement. All contacts potentionally viable for being contacted are
>> still looked at, but the difference with this patch is that it checks
>> the viability before shipping it off to add_notification(), which does
>> fix issue 98 but at the expense of quite a lot of code duplication.
> 
> normally, all contacts would have been added to the notification_list in
> memory, even those not actually passing the viability checks. but at
> this stage of the code, nobody is aware of that so the list gets
> populated either way by calling add_notification().
> 
> /* add all individual contacts for this host */
>  ^^^
> 
> having that notification_list created, this remains fully linked in
> memory. let's say, you have a bunch of some 1k contacts for that
> service, and actually the alarm would hit only those in the nonworkhours
> or workhours timeperiod and only on critical, for the ops team e.g.
> so by looping through the notification_list, you will encounter *all*
> contacts for that host, only the duplicates have been removed.
> 
> /* notify each contact (duplicates have been removed) */
> 
> then you'll fire up the actual notification with calling
> notify_contact_of_host - and actually in there, the current core checks
> the viability for the contact to be notified.
> 
> you are right, if each contact gets notified 24x7 on all
> notification_options, the algorithm stays the same. but if you happen to
> have a lot of different contacts assigned to hosts and services, not
> getting notified each time a notification is triggered, the overall
> amount of looping through notification_list will be shorter and save
> some cpu cycles, and probably on larger systems, a bit more than just
> some as this means a reduction of the looping for each contact to be
> checked to be notified on the actual end-of-the-line.
> 

Right, but all contacts are still checked for viability, so the amount
of looping is reduced once for all those who aren't viable, while the
number of viability checks (which I presume is the expensive part) will
remain the same.

> furthermore, where do you get the idea of code duplication from? the
> only changes made by this patch is actually moving the viability checks
> and therefore passing an additional function parameter which makes the
> diff a bit more bloated than it should be.
> 

The fact that the patch introduces eight locations with identical code
headed with "check now if contact can be notified".

The proper way to do this would be to introduce create_recipient_list(),
passing it all the variables it needs to produce a list of recipients
that have duplicates removed *and* are viable for being contacted. If a
lot of code still has to be duplicated (as in the patch), more helpers
in the form of add_recipient_for_service(&mac, srv, cntct) would be
nifty so the viability check can be moved there without breaking the
abi for create_notification_list_from_{host,service}().

I'm in the middle of a release at $dayjob so I've had to postpone that
u