Re: [Nagios-users] DNX Version 0.15 Released!
John, On Wed, Feb 20, 2008 at 11:14 AM, John Calcote [EMAIL PROTECTED] wrote: DNX is a modular extension of Nagios that offloads a significant portion of the work normally done by Nagios to a distributed network Why not just off-load the Nagios checks to Condor, GNU Queue, PBS, or some other distributed job system? All of them support Perl and be setup to return output to a specific location (i.e. Nagio 'checkresults' directory or the external commands queue.) Or even additional passive check drones that run Nagios checks in Xen or Vmware containers out in the environment, passing data back to your Nagios console. Don't get me wrong, DNX is a great and viable concept, I just have these other resources already in production. What types of distribution have people used to help Nagios stay on top of checks? Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NagiosPluginsNT scaling to 400 sites?
Roger, On Feb 19, 2008 8:58 AM, Roger [EMAIL PROTECTED] wrote: I'm wondering if one centralized Nagios server can use the NagiosPluginsNT project (http://tinyurl.com/2y8ykr) to effectively monitor certain critical internal services from that location. I'm monitoring 3,100 hosts at one location and 215 at another with very few issues. As long as you get as much work off the server, distributed to hosts (i.e. use NSCA or some kind of passive push) you'll be fine as your volume grows. If you are using virtualization, then you need a balanced model where you are using passive only on the global zone or host operating system, then light polling on individual zones. The model you describe sounds like it will work. There is a point where you'll need to optimize your top end hardware. My 215 sites is a test bed in a Solaris zone on a P4 2.8Ghz single processor ... needless to say, it doesn't run as clean as the larger installation. Some common problems you'll face: -- Your 'checkresults' queue may grow in size with stale checks because of the time it takes to cycle through all the hosts. Passive checks reporting to a main console will improve this and so will a more meaty head server. -- Some of the Monitoring menus become useless. For example, Host Detail will take forever to load and Hostgroup Grid will kill Internet Explorer. It would be nice to disable these menu items. Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] status.cgi very high cpu usage
Steve, On Feb 18, 2008 10:51 PM, Steve Kieu [EMAIL PROTECTED] wrote: I have a problem with status.cgi taking up too much cpu so the page is very slow to render. Is there any way to find out where the problem is? We have about 650 services monitored. The output os nagios -s command is Many of the Monitoring reports don't work well at volume, I've been asking users to only use Unhandled reports. You may get better response in Mozilla, but 'status.cgi' can kill Internet Explorer because of how it's loading everything in one large list. Nagios is at the point where it needs an SQL back end with a more modular look at how it stores site data. Perhaps, rolling status up into summary reports that are queried to create reports then go into host tables only when someone drills down into host information. In production you'll want to be on a multi-core multi-threaded machine; 2 cores won't do it if you'll have more than one user in the system. Until then, keep users in the Unhandled menus around {Service,Host} Problems Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NagiosPluginsNT scaling to 400 sites?
Hari, On Feb 19, 2008 10:12 AM, Hari Sekhon [EMAIL PROTECTED] wrote: If they don't appear in the sidebar, then you're unlikely to type in the url to hang the browser even if you knew what it was. Excellent, commented out Detail and Grid options from 'side.html' ... definitely keeps people from bogging down the server with 'status.cgi' and still let's people drill down specific selects. Thank you. Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] conditional inclusion of cfg files
Jonathan, Jonathan Mills wrote: Okay, let's say you have lots of distributed pollers, which should only load cfg files for their particular environment (both hosts and services). However, you'd like to manage the same set of global cfg I setup configurations in subdirectories by datacenter (location of the distributed poller) then have a local 'sed' check change the location of the configuration relevant to that specific box. These directories are also broken out individually ... $NAGIOS_HOME/etc/corporate/$ASSET_GROUP/{hosts,groups,contact}.cfg $NAGIOS_HOME/etc/datacenter/$CITY_STATE/{hosts,groups,contact}.cfg $NAGIOS_HOME/etc/thirdparty/$VENDOR/{hosts,groups,contact}.cfg All my pollers check a common pool (corporate, thirdparty) and have local checks for their own site (datacenter.) $CITY_STATE could also be the name of your host, that makes 'sed' a little easier. Use 'cfg_dir=' You could use 'cfengine', I use 'sed', but just comment out all datacenters on your console and uncomment the one necessary for a particular datacenter. If you won't want to use 'sed' or 'cfengine' then just uncomment it by hand. Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Remote monitoring
Paul, On Feb 11, 2008 11:06 PM, Paul Aviles [EMAIL PROTECTED] wrote: I am looking for a way to remotely monitor Windows servers. The servers are on a remote network and using network address translation so they are not Place NRPE on your windows boxes, then (a) talk with your network team about a low band VPN connection between sites or (b) your systems team about a proxy for NSCA via an SSL connection; or (c) any combination of the two. Either way, wrap your traffic in something secure and be sure to monitor the gateway between your two sites so you'll know when passive checks might stop. Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Parenting vs Dependencies
Mark, On Feb 11, 2008 12:13 PM, [EMAIL PROTECTED] wrote: this to my boss. So here is an overview of what I have to monitor, what my boss is asking, and what I think we need and maybe someone can beat some I know listening to the boss is good for your long term employment, however, who has to troubleshoot the environment should you get an alert? Setup what ever makes the most since for identifying root cause and narrowing down a problem -- neither way suggested is wrong, just start with what's easiest to get setup today. Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios checkresults queue grows over time
Update on 'checkresults' queue growth, Nagios 3.0 rc1 ... http://www.nagiosexchange.org/nagios-users.34.0.html?tx_maillisttofaq_pi1[mode]=1tx_maillisttofaq_pi1[showUid]=9116 I can keep the system from coming down completely by eliminating host checks. It seems the rapid growth of checks is nagios reading stale entries, then scheduling recheck, which then becomes stale because nagios doesn't get it in time to process. Without host checks, I get fewer ... [1202750959] Warning: The check of service 'URL' on host 'FQDN0.com' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the service... ... in the logs. Has any 'checkresults' queuing issues been resolved in RC2 ... I didn't see anything specific in the Changelog? Anyone else experiencing a queue that grows slowly overtime and not processing service checks in a timely manner? Best, Justin -- Attention Sales And Marketing Professionals Who Serve B2B Executives http://hittpublishingdirect.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios checkresults queue grows over time
I have two Nagios 3.0 cr1 systems, (A) on a 2.8ghz Solaris 10 system with 212 hosts and (B) the other on VPS multiple core system with 2,916 hosts. Both systems, after the initial host check, has it's [/usr/local/nagios/var/spool/checkresults] grow in size till nagios in non responsive. (A) Has a modified configuration with a longer cached_host_check_horizon=2700 and cached_service_check_horizon=1800. I tried to stretch out the time frame that checks were accepted. (B) Has a more standard configuration with reasonable cache counts. Both systems are using use_large_installation_tweaks=1 and otherwise are standardly configured. Each system allows 45 minutes to finish the host checks. I've also tried this configuration without host checks. Both systems have very low CPU utilization after the initial host check and hardly go over 20% during regular operations. The checkresults queue does go up and down in the number of 'check' files, often dropping down as much as 200 checks, the popping backup twice as much. I've tried tuning the max_check_result_file_age=3600 which tends to make the queue last longer. I'm also purging the queue of files older than 90 minutes with ... 0,15,30,45 * * * * ( /usr/local/bin/find /usr/local/nagios/var/spool/checkresults -type f -mmin +90 -exec /bin/rm -f {} \; ) /dev/null 21 ... in the crontab. Finally, here's what I see in the log files ... [1202485459] Warning: The check of host 'FQDN0.com' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... [1202485459] Warning: The check of host 'FQDN1.com' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... [1202485459] Warning: The check of host 'FQDN2.com' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... ... which again is why I tuned the max_check_result_file and am purging the queue of really old files. (I've also tested very short max_check_result_file, at the current setting I've minimized flapping.) Other checks that didn't improve the situation ... -- Nice'd the nagios process to give highest priority possible. Increased CPU load a little, but over time got the same idle conditions after checks where complete. -- Stretched out checks to 15 minutes for critical services and 2 hours for nice to know about services. Made queues fill up less frequently. -- Looked at disk performance and swapping. Neither system is swapping nor does it have bottlenecks around disk issues. With the purge routine, I won't see a file in the queue older than 90 minutes. Does this mean max_check_result_file isn't working? What other parameters can I adjust? Anyone have any ideas of what's going on? Best, Justin - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] problem with using hostgroup_name in service definitions: Error: Hostgroup name and/or alias is NULL
Joost van Baal, On 2/8/08, Joost van Baal [EMAIL PROTECTED] wrote: Is it possible at all to _not_ have an explicit object (with it's own define) in the nagios configuration for the number of hosts times the number of services on each such host? I'm managing 2,900+ hosts on one environment and have found you do need to define each object completely, however, you can use an object oriented approach of inherited characteristics. With this I create a directory for each business unit or group, then a file in each directory for each host type (i.e. static, dynamic, core, ...), with a generic host for each file with characteristics specific to that group. This way I only need 4 lines for host definitions (5 if you have a 'parents' defined.) The top of the file defines any group specific host checks or intervals. This is kind of like your [/etc/nagios2-test/head_hosts.cfg] example, except I may have hosts_network.cfg and a hosts_core.cfg each with a more expanded first definition, lean-host in your example, followed by a list of all hosts associated to that definition. The same works for services definitions and host dependencies. Think monitoring groups rather than monitoring hosts then layout hosts in groups by category or purpose. See: http://nagios.sourceforge.net/docs/3_0/objecttricks.html I find this also works for building host groups and when multiple people might be updating the configuration files. Best, Justin - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null