Re: [Nagios-users] DNX Version 0.15 Released!

2008-02-21 Thread Justin Hitt
John,

On Wed, Feb 20, 2008 at 11:14 AM, John Calcote [EMAIL PROTECTED] wrote:
  DNX is a modular extension of Nagios that offloads a significant
  portion of the work normally done by Nagios to a distributed network

Why not just off-load the Nagios checks to Condor, GNU Queue, PBS, or
some other distributed job system?  All of them support Perl and be
setup to return output to a specific location (i.e. Nagio
'checkresults' directory or the external commands queue.)

Or even additional passive check drones that run Nagios checks in Xen
or Vmware containers out in the environment, passing data back to your
Nagios console.  Don't get me wrong, DNX is a great and viable
concept, I just have these other resources already in production.

What types of distribution have people used to help Nagios stay on top
of checks?

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
 http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] NagiosPluginsNT scaling to 400 sites?

2008-02-19 Thread Justin Hitt
Roger,

On Feb 19, 2008 8:58 AM, Roger [EMAIL PROTECTED] wrote:
 I'm wondering if one centralized Nagios server can use the NagiosPluginsNT
 project (http://tinyurl.com/2y8ykr) to effectively monitor certain critical
 internal services from that location.

I'm monitoring 3,100 hosts at one location and 215 at another with
very few issues.  As long as you get as much work off the server,
distributed to hosts (i.e. use NSCA or some kind of passive push)
you'll be fine as your volume grows.

If you are using virtualization, then you need a balanced model where
you are using passive only on the global zone or host operating
system, then light polling on individual zones.

The model you describe sounds like it will work.  There is a point
where you'll need to optimize your top end hardware.  My 215 sites is
a test bed in a Solaris zone on a P4 2.8Ghz single processor ...
needless to say, it doesn't run as clean as the larger installation.

Some common problems you'll face:
  -- Your 'checkresults' queue may grow in size with stale checks
because of the time it takes to cycle through all the hosts.  Passive
checks reporting to a main console will improve this and so will a
more meaty head server.
 -- Some of the Monitoring menus become useless.  For example, Host
Detail will take forever to load and Hostgroup Grid will kill
Internet Explorer.  It would be nice to disable these menu items.

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] status.cgi very high cpu usage

2008-02-19 Thread Justin Hitt
Steve,

On Feb 18, 2008 10:51 PM, Steve Kieu [EMAIL PROTECTED] wrote:
 I have a problem with status.cgi taking up too much cpu so the page is very
 slow to  render. Is there any way to find out where the problem is?
 We have about 650 services monitored. The output os nagios -s command is

Many of the Monitoring reports don't work well at volume, I've been
asking users to only use Unhandled reports.  You may get better
response in Mozilla, but 'status.cgi' can kill Internet Explorer
because of how it's loading everything in one large list.

Nagios is at the point where it needs an SQL back end with a more
modular look at how it stores site data.  Perhaps, rolling status up
into summary reports that are queried to create reports then go into
host tables only when someone drills down into host information.

In production you'll want to be on a multi-core multi-threaded
machine; 2 cores won't do it if you'll have more than one user in the
system.  Until then, keep users in the Unhandled menus around
{Service,Host} Problems

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] NagiosPluginsNT scaling to 400 sites?

2008-02-19 Thread Justin Hitt
Hari,

On Feb 19, 2008 10:12 AM, Hari Sekhon [EMAIL PROTECTED] wrote:
 If they don't appear in the sidebar, then you're unlikely to type in the
 url to hang the browser even if you knew what it was.

Excellent, commented out Detail and Grid options from 'side.html'
... definitely keeps people from bogging down the server with
'status.cgi' and still let's people drill down specific selects.
Thank you.

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] conditional inclusion of cfg files

2008-02-18 Thread Justin Hitt
Jonathan,

Jonathan Mills wrote:
  Okay, let's say you have lots of distributed pollers, which should
  only load cfg files for their particular environment (both hosts and
  services).  However, you'd like to manage the same set of global cfg

I setup configurations in subdirectories by datacenter (location of
the distributed poller) then have a local 'sed' check change the
location of the configuration relevant to that specific box.

These directories are also broken out individually ...

$NAGIOS_HOME/etc/corporate/$ASSET_GROUP/{hosts,groups,contact}.cfg
$NAGIOS_HOME/etc/datacenter/$CITY_STATE/{hosts,groups,contact}.cfg
$NAGIOS_HOME/etc/thirdparty/$VENDOR/{hosts,groups,contact}.cfg

All my pollers check a common pool (corporate, thirdparty) and
have local checks for their own site (datacenter.)  $CITY_STATE
could also be the name of your host, that makes 'sed' a little easier.

Use 'cfg_dir='

You could use 'cfengine', I use 'sed', but just comment out all
datacenters on your console and uncomment the one necessary for a
particular datacenter.  If you won't want to use 'sed' or 'cfengine'
then just uncomment it by hand.

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Remote monitoring

2008-02-12 Thread Justin Hitt
Paul,

On Feb 11, 2008 11:06 PM, Paul Aviles [EMAIL PROTECTED] wrote:
 I am looking for a way to remotely monitor Windows servers. The servers are
 on a  remote network and using network address translation so they are not

Place NRPE on your windows boxes, then (a) talk with your network team
about a low band VPN connection between sites or (b) your systems team
about a proxy for NSCA via an SSL connection; or (c) any combination
of the two.

Either way, wrap your traffic in something secure and be sure to
monitor the gateway between your two sites so you'll know when passive
checks might stop.

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Parenting vs Dependencies

2008-02-11 Thread Justin Hitt
Mark,

On Feb 11, 2008 12:13 PM,  [EMAIL PROTECTED] wrote:
 this to my boss. So here is an overview of what I have to monitor, what my
 boss is asking, and what I think we need and maybe someone can beat some

I know listening to the boss is good for your long term employment,
however, who has to troubleshoot the environment should you get an
alert?  Setup what ever makes the most since for identifying root
cause and narrowing down a problem -- neither way suggested is wrong,
just start with what's easiest to get setup today.

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios checkresults queue grows over time

2008-02-11 Thread Justin Hitt
Update on 'checkresults' queue growth, Nagios 3.0 rc1 ...
http://www.nagiosexchange.org/nagios-users.34.0.html?tx_maillisttofaq_pi1[mode]=1tx_maillisttofaq_pi1[showUid]=9116

I can keep the system from coming down completely by eliminating host
checks.  It seems the rapid growth of checks is nagios reading stale
entries, then scheduling recheck, which then becomes stale because
nagios doesn't get it in time to process.

Without host checks, I get fewer ...
[1202750959] Warning: The check of service 'URL' on host 'FQDN0.com'
looks like it was orphaned (results never came back).  I'm scheduling
an immediate check of the service...
... in the logs.

Has any 'checkresults' queuing issues been resolved in RC2 ... I
didn't see anything specific in the Changelog?  Anyone else
experiencing a queue that grows slowly overtime and not processing
service checks in a timely manner?

Best,

Justin
-- 
Attention Sales And Marketing Professionals Who Serve B2B Executives
   http://hittpublishingdirect.com/

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios checkresults queue grows over time

2008-02-08 Thread Justin Hitt
I have two Nagios 3.0 cr1 systems, (A) on a 2.8ghz Solaris 10 system
with 212 hosts and (B) the other on VPS multiple core system with
2,916 hosts.  Both systems, after the initial host check, has it's
[/usr/local/nagios/var/spool/checkresults] grow in size till nagios in
non responsive.

(A) Has a modified configuration with a longer
cached_host_check_horizon=2700 and
cached_service_check_horizon=1800.  I tried to stretch out the time
frame that checks were accepted.

(B) Has a more standard configuration with reasonable cache counts.

Both systems are using use_large_installation_tweaks=1 and otherwise
are standardly configured.  Each system allows 45 minutes to finish
the host checks.  I've also tried this configuration without host
checks.

Both systems have very low CPU utilization after the initial host
check and hardly go over 20% during regular operations.

The checkresults queue does go up and down in the number of 'check'
files, often dropping down as much as 200 checks, the popping backup
twice as much.  I've tried tuning the max_check_result_file_age=3600
which tends to make the queue last longer.

I'm also purging the queue of files older than 90 minutes with ...
0,15,30,45 * * * * ( /usr/local/bin/find
/usr/local/nagios/var/spool/checkresults -type f -mmin +90 -exec
/bin/rm -f {} \; )  /dev/null 21
... in the crontab.

Finally, here's what I see in the log files ...
[1202485459] Warning: The check of host 'FQDN0.com' looks like it was
orphaned (results never came back).  I'm scheduling an immediate check
of the host...
[1202485459] Warning: The check of host 'FQDN1.com' looks like it was
orphaned (results never came back).  I'm scheduling an immediate check
of the host...
[1202485459] Warning: The check of host 'FQDN2.com' looks like it was
orphaned (results never came back).  I'm scheduling an immediate check
of the host...
... which again is why I tuned the max_check_result_file and am
purging the queue of really old files.  (I've also tested very short
max_check_result_file, at the current setting I've minimized
flapping.)

Other checks that didn't improve the situation ...
 -- Nice'd the nagios process to give highest priority possible.
Increased CPU load a little, but over time got the same idle
conditions after checks where complete.
 -- Stretched out checks to  15 minutes for critical services and  2
hours for nice to know about services.  Made queues fill up less
frequently.
 -- Looked at disk performance and swapping.  Neither system is
swapping nor does it have bottlenecks around disk issues.

With the purge routine, I won't see a file in the queue older than 90
minutes.  Does this mean max_check_result_file isn't working?  What
other parameters can I adjust?  Anyone have any ideas of what's going
on?

Best,

Justin

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] problem with using hostgroup_name in service definitions: Error: Hostgroup name and/or alias is NULL

2008-02-08 Thread Justin Hitt
Joost van Baal,

On 2/8/08, Joost van Baal [EMAIL PROTECTED] wrote:
 Is it possible at all to _not_ have an explicit object (with it's own
 define) in the nagios configuration for the number of hosts times the
 number of services on each such host?

I'm managing 2,900+ hosts on one environment and have found you do
need to define each object completely, however, you can use an object
oriented approach of inherited characteristics.

With this I create a directory for each business unit or group, then a
file in each directory for each host type (i.e. static, dynamic, core,
...), with a generic host for each file with characteristics specific
to that group.  This way I only need 4 lines for host definitions (5
if you have a 'parents' defined.)  The top of the file defines any
group specific host checks or intervals.

This is kind of like your [/etc/nagios2-test/head_hosts.cfg] example,
except I may have hosts_network.cfg and a hosts_core.cfg each with
a more expanded first definition, lean-host in your example,
followed by a list of all hosts associated to that definition.

The same works for services definitions and host dependencies.  Think
monitoring groups rather than monitoring hosts then layout hosts
in groups by category or purpose.  See:
http://nagios.sourceforge.net/docs/3_0/objecttricks.html

I find this also works for building host groups and when multiple
people might be updating the configuration files.

Best,

Justin

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null