Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering

2007-10-16 Thread Jonathan Call
Sean;

I have a very large deployment so I use this tool:

http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon

This daemon runs on each of the distributed servers while a normal ncsa
daemon listens on the central server.
 
Jonathan

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Sean McAvoy
 Sent: Monday, October 15, 2007 12:09 PM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Distributed monitoring Freshness
 checkingfailing then recovering
 
 On further investigations it looks as though the problem is with the
 time taken to submit the results back to nagios via send_nsca.
 I have read about a couple different options for getting results back
 quickly. One being a bulk system of transfer, a file containing the
 results is sent via a send_nsca bulk transfer executed via cron. The
 other being a system that makes use of the performance data output
 option on the remote nagios systems and submits the results using a
 custom daemon on both ends.
 Does anybody know of any other options? Also, is there any guides to
 setting up either of these options, most of what I have read is email
 threads..
 Thanks.
 
 On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote:
 
  Hello,
  I have 1 central nagios system with 5 distributed servers. I have
  enabled freshness checking on both central and remote systems. I am
  constantly seeing services go to unknown status for 1-3 minutes and
  then recover.
  on the remotes I have:
  check_service_freshness=1
  service_freshness_check_interval=10
  check_host_freshness=1
  host_freshness_check_interval=60
  service_inter_check_delay_method=s
  max_service_check_spread=10
  service_interleave_factor=1
  host_inter_check_delay_method=s
  max_host_check_spread=30
  max_concurrent_checks=0
 
  It does appear as though checks are being run in parallel. I'm
wonder
  how I can best determine where the problem is, with the execution of
  checks, submittal to the central system or other.
  Thanks.
 
 
  _sean
 
 
--
  ---
  This SF.net email is sponsored by: Splunk Inc.
  Still grepping through log files to find problems?  Stop.
  Now Search log events and configuration files using AJAX and a
  browser.
  Download your FREE copy of Splunk now  http://get.splunk.com/
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when
  reporting any issue.
  ::: Messages without supporting info will risk being sent to
/dev/null
 
 Sean McAvoy
 NOC Acting Team Lead
 Afilias Canada
 
 P. 416.673.4194
 
 
 
 


-
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a
browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering

2007-10-16 Thread Live Great
Hi Jonathan,

Why not use check_by_ssh instead? 
Is there any pitfall (weakness) in using check_by_ssh compared agent like OCP?

Thanks
Sam

- Original Message 
From: Jonathan Call [EMAIL PROTECTED]
To: Sean McAvoy [EMAIL PROTECTED]; nagios-users@lists.sourceforge.net
Sent: Wednesday, October 17, 2007 7:19:46 AM
Subject: Re: [Nagios-users] Distributed monitoring Freshness checkingfailing 
then recovering

Sean;

I have a very large deployment so I use this tool:

http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon

This daemon runs on each of the distributed servers while a normal ncsa
daemon listens on the central server.
 
Jonathan

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Sean McAvoy
 Sent: Monday, October 15, 2007 12:09 PM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Distributed monitoring Freshness
 checkingfailing then recovering
 
 On further investigations it looks as though the problem is with the
 time taken to submit the results back to nagios via send_nsca.
 I have read about a couple different options for getting results back
 quickly. One being a bulk system of transfer, a file containing the
 results is sent via a send_nsca bulk transfer executed via cron. The
 other being a system that makes use of the performance data output
 option on the remote nagios systems and submits the results using a
 custom daemon on both ends.
 Does anybody know of any other options? Also, is there any guides to
 setting up either of these options, most of what I have read is email
 threads..
 Thanks.
 
 On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote:
 
  Hello,
  I have 1 central nagios system with 5 distributed servers. I have
  enabled freshness checking on both central and remote systems. I am
  constantly seeing services go to unknown status for 1-3 minutes and
  then recover.
  on the remotes I have:
  check_service_freshness=1
  service_freshness_check_interval=10
  check_host_freshness=1
  host_freshness_check_interval=60
  service_inter_check_delay_method=s
  max_service_check_spread=10
  service_interleave_factor=1
  host_inter_check_delay_method=s
  max_host_check_spread=30
  max_concurrent_checks=0
 
  It does appear as though checks are being run in parallel. I'm
wonder
  how I can best determine where the problem is, with the execution of
  checks, submittal to the central system or other.
  Thanks.
 
 
  _sean
 
 
--
  ---
  This SF.net email is sponsored by: Splunk Inc.
  Still grepping through log files to find problems?  Stop.
  Now Search log events and configuration files using AJAX and a
  browser.
  Download your FREE copy of Splunk now  http://get.splunk.com/
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when
  reporting any issue.
  ::: Messages without supporting info will risk being sent to
/dev/null
 
 Sean McAvoy
 NOC Acting Team Lead
 Afilias Canada
 
 P. 416.673.4194
 
 
 
 


-
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a
browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null




-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users