Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
I'm continuing to iron out the wrinkles with 3.5.1 and distributed monitoring. I'm using mod_gearman to submit and receive events from two distributed pollers. Every now and again, I'll get something similar in the log on the centralized collecting machine: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) To me, that suggests that the collector system didn't get a result for a host or service in a timely manner from one of the polling systems, and so it attempted to run an active check itself. However, it doesn't seem to be able to, and I don't know why. The collector has the same value for $USER1$, and it has the same set of plugins installed on it: On the collector: grep USER1 etc/resource.cfg $USER1$=/usr/local/nagios/libexec On the two pollers: $USER1$=/usr/local/nagios/libexec $USER1$=/usr/local/nagios/libexec The plugins are installed in identical locations on all three systems, that's enforced via Puppet. The 'nagios' user can find and run them on the collector: /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 NRPE v2.13 Now, because this is a distributed setup, the collector system is not configured to run active checks: grep ^execute etc/nagios.cfg execute_service_checks=0 execute_host_checks=0 ... but *obviously* it's trying to. Is it failing because it's configured to not run them? If that's the case, the error message is not accurate and should be corrected. If that's *not* the case, why can't my collector server run an active check when it believes it needs to? I use NConf to generate my configurations, if that matters. There are a *lot* of hosts/services and quite a few configuration files, so I'm not going to paste a slew of information here. If I'm missing pertinent information, please let me know exactly what you want to see and I'll get it. No one has an idea about this? And no, Andreas, I can't move to 4.0 yet. ;) Thanks! Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
Do you get many of those error messages in the logs at once, or just one at a time? Only one thought: what are the permissions on your $USER$ variables? Nagios on my systems setuid() to nonroot after startup, and if it gets SIGHUP to reload config, but can't read the file defining $USER*$, will act strangely. Justin On Wed, Aug 28, 2013 at 06:48:09AM -0500, C. Bensend wrote: I'm continuing to iron out the wrinkles with 3.5.1 and distributed monitoring. I'm using mod_gearman to submit and receive events from two distributed pollers. Every now and again, I'll get something similar in the log on the centralized collecting machine: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) To me, that suggests that the collector system didn't get a result for a host or service in a timely manner from one of the polling systems, and so it attempted to run an active check itself. However, it doesn't seem to be able to, and I don't know why. The collector has the same value for $USER1$, and it has the same set of plugins installed on it: On the collector: grep USER1 etc/resource.cfg $USER1$=/usr/local/nagios/libexec On the two pollers: $USER1$=/usr/local/nagios/libexec $USER1$=/usr/local/nagios/libexec The plugins are installed in identical locations on all three systems, that's enforced via Puppet. The 'nagios' user can find and run them on the collector: /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 NRPE v2.13 Now, because this is a distributed setup, the collector system is not configured to run active checks: grep ^execute etc/nagios.cfg execute_service_checks=0 execute_host_checks=0 ... but *obviously* it's trying to. Is it failing because it's configured to not run them? If that's the case, the error message is not accurate and should be corrected. If that's *not* the case, why can't my collector server run an active check when it believes it needs to? I use NConf to generate my configurations, if that matters. There are a *lot* of hosts/services and quite a few configuration files, so I'm not going to paste a slew of information here. If I'm missing pertinent information, please let me know exactly what you want to see and I'll get it. No one has an idea about this? And no, Andreas, I can't move to 4.0 yet. ;) Thanks! Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
Do you get many of those error messages in the logs at once, or just one at a time? Only one thought: what are the permissions on your $USER$ variables? Nagios on my systems setuid() to nonroot after startup, and if it gets SIGHUP to reload config, but can't read the file defining $USER*$, will act strangely. Just one at a time, seemingly randomly. A host here, a service there, several times a day. They always almost immediately recover, but I don't understand why my centralized collector seems to have this issue. Nagios runs as the nagios user, which can read the resource.cfg file fine: ls -ld . ; ls -l nagios-hostname.cfg resource.cfg drwxrwx--- 6 root nagios 4096 Aug 27 16:02 . -rw-r--r-- 1 root root 47606 Jul 1 11:18 nagios-hostname.cfg -rw-r- 1 root nagios 2400 Mar 19 11:25 resource.cfg Thanks! -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
On 8/22/13 13:51, C. Bensend wrote: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) Hi, if this is the collector host, why does it have a mod-gearman worker installed? If nagios would have run the check by itself, there would be no hint about the worker in the error. So it seems like there is a worker started on your collector host which then grabs some checks but isn't able to execute them. Regards, Sven -- Sven Nierlein sven.nierl...@consol.de ConSol* GmbH http://www.consol.de Franziskanerstrasse 38Tel.:089/45841-439 81669 MuenchenFax.:089/45841-111 -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
On 8/22/13 13:51, C. Bensend wrote: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) Hi, if this is the collector host, why does it have a mod-gearman worker installed? If nagios would have run the check by itself, there would be no hint about the worker in the error. So it seems like there is a worker started on your collector host which then grabs some checks but isn't able to execute them. Oh ho! I have multiple *gearman* processes running: ps axuww | grep gearman gearmand 5662 0.7 0.1 404672 2496 ?Ssl Aug17 118:29 /usr/sbin/gearmand -d -l /var/log/gearmand/gearmand.log nagios5712 0.0 0.0 38024 640 ?Ss Aug17 1:03 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid nagios 25919 0.0 0.1 137492 3016 ?S07:38 0:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid .. etc .. Are you saying I just need gearmand running on the collector? I'm quite new to gearman, so I might have misunderstood which parts are necessary where. I can easily shut down the mod_gearman_worker service, I just need to understand the consequences. I assumed that this was a Nagios error - perhaps I just have my gearman setup configured wrong. Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
On 8/28/13 14:43, C. Bensend wrote: Are you saying I just need gearmand running on the collector? Well, i assumed it. You are the only one which really can tell that. You will need a worker on each host which should run checks. If your collector should not run any checks, than no worker is necessary. See http://labs.consol.de/nagios/mod-gearman/#_common_scenarios for a list of common setups. Sven -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
On 8/28/13 14:43, C. Bensend wrote: Are you saying I just need gearmand running on the collector? Well, i assumed it. You are the only one which really can tell that. You will need a worker on each host which should run checks. If your collector should not run any checks, than no worker is necessary. See http://labs.consol.de/nagios/mod-gearman/#_common_scenarios for a list of common setups. OK, yes, I grok that. I guess I would want the collector to be *able* to run checks, if it doesn't get timely information from the pollers. I'm assuming that's why it's even trying in the first place - it doesn't see a result in a timely manner, so it thinks it should run one. Which circles back to my original question - why can't it run the check? Why isn't it finding what it needs to find? The workers are running as the nagios user, and I don't see anything that appears pertinent in the mod_gearman_worker.conf file... What am I missing? Neither the gearmand.log nor the mod_gearman_worker.log files seem to have any complaints (but I haven't bumped up the debug on them yet). Thanks so much for your help! Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
Hey folks, I'm continuing to iron out the wrinkles with 3.5.1 and distributed monitoring. I'm using mod_gearman to submit and receive events from two distributed pollers. Every now and again, I'll get something similar in the log on the centralized collecting machine: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) To me, that suggests that the collector system didn't get a result for a host or service in a timely manner from one of the polling systems, and so it attempted to run an active check itself. However, it doesn't seem to be able to, and I don't know why. The collector has the same value for $USER1$, and it has the same set of plugins installed on it: On the collector: grep USER1 etc/resource.cfg $USER1$=/usr/local/nagios/libexec On the two pollers: $USER1$=/usr/local/nagios/libexec $USER1$=/usr/local/nagios/libexec The plugins are installed in identical locations on all three systems, that's enforced via Puppet. The 'nagios' user can find and run them on the collector: /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 NRPE v2.13 Now, because this is a distributed setup, the collector system is not configured to run active checks: grep ^execute etc/nagios.cfg execute_service_checks=0 execute_host_checks=0 ... but *obviously* it's trying to. Is it failing because it's configured to not run them? If that's the case, the error message is not accurate and should be corrected. If that's *not* the case, why can't my collector server run an active check when it believes it needs to? I use NConf to generate my configurations, if that matters. There are a *lot* of hosts/services and quite a few configuration files, so I'm not going to paste a slew of information here. If I'm missing pertinent information, please let me know exactly what you want to see and I'll get it. I'd really appreciate a clue-by-four. Thanks, folks! :) Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring: v3.4.1 not translating host states like it should
Hey folks, I am in the process of implementing a distributed monitoring architecture, and I'm having some problems with host state. Here are the specs: Nagios v3.4.1 RHEL 6.3 Using NSCA to send results to passive collector Yes, I have 'translate_passive_host_checks' set on the collector. :) So, the system is up and running, and I do see host alerts in /var/log/messages on the collector. However, in the web interface, all hosts remain up. I can go into the host details for a host that's offline because of Sandy, and it reports a host status of UP, with the status information PING CRITICAL - Packet loss 100%. Obviously, the host states coming from the passive monitors are not being translated. Active host and service checks are disabled on the collector, and enabled on the monitors. Passive host and service checks are enabled everywhere, and the collector *is* receiving them. I'd appreciate it if someone can help me out here... I'll provide whatever details are necessary... Thanks much! Benny -- Unless you're a lawyer, you don't understand Oracle licensing. That applies equally to Oracle employees as well as customers. -- Me, 2012-05-10 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring
Hi Thanks Dan! I'm reading about check_mk with Livestatus and I think it'll help me. -- Wallace Gerheim -- Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring
Hello folks, I'm new on Nagios and nagios-users mailing list. I was looking for addons for Nagios but i didn't find one wich attends me. Let me explain my scenario. If someone could help me, i ll be grateful. I have one nagios central with centreon. I have another nagios (worker), with centreon, wich i want to configure some hots and hostgroups that i don't want to configure on nagios central. But i want to monitor workers host and hotsgroups with nagios central. Summarizing, when i manipulate hosts on workers i don't want to put in nagios central. I've found NCSA, DNX and Gearman but it don't help. Could anyone? Thanks! *Wallace Knopp de Menezes Gerheim* -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring
Gerheim wrote: Hello folks, I'm new on Nagios and nagios-users mailing list. I was looking for addons for Nagios but i didn't find one wich attends me. Let me explain my scenario. If someone could help me, i ll be grateful. I have one nagios central with centreon. I have another nagios (worker), with centreon, wich i want to configure some hots and hostgroups that i don't want to configure on nagios central. But i want to monitor workers host and hotsgroups with nagios central. Summarizing, when i manipulate hosts on workers i don't want to put in nagios central. I've found NCSA, DNX and Gearman but it don't help. Could anyone? Thanks! /Wallace Knopp de Menezes Gerheim/ Hello and Welcome. For distributed nagios you want to start here http://nagios.sourceforge.net/docs/3_0/distributed.html The main point that you will need to know is that in order for the central nagios to see the hosts monitored by the worker , the definition of the remote monitored hosts need to also be on the central nagios ( defined as passive checks and hosts) . Nagios currently has no way of detecting the child hosts of the worker and adding them to it's configuration , there for they have to be defined on both servers ( with minor changes : active/passive) . Assaf -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring
On 30/03/11 14:00, Gerheim wrote: Hello folks, I'm new on Nagios and nagios-users mailing list. I was looking for addons for Nagios but i didn't find one wich attends me. Let me explain my scenario. If someone could help me, i ll be grateful. I have one nagios central with centreon. I have another nagios (worker), with centreon, wich i want to configure some hots and hostgroups that i don't want to configure on nagios central. But i want to monitor workers host and hotsgroups with nagios central. Summarizing, when i manipulate hosts on workers i don't want to put in nagios central. I've found NCSA, DNX and Gearman but it don't help. Could anyone? Thanks! At $dayjob I discovered the same problem, however we also had another problem in that a lot of the servers we want to monitor are in walled gardens and only have http(s) access to the internet. Our solution was to semi-create our own uploader mechanism, we use nsca on the receiver side and use a custom submit_check_results that instead of piping the results through nsca_send it writes them into a data file for the time/service which gets stored for scheduled upload via cron. This scheduled uploader keeps track of the hosts/localhost.cfg file and if it detects change adds that to be uploaded too. These files are then bzipped and uploaded via curl to a php script on the nagios server that knows how to save them in for use. The nagios server then has a scheduled process that first looks to see if there's a new config file for hosts and moves it into etc/hosts/*.cfg ( separate files per host ) and then reloads nagios, it then takes the contents of the remaining nagios data files and pipes the contents of them through ncsa. Although this scenario is far from perfect and we've only been working on it for 2 weeks on and off it seems to suit our needs. Maybe looking at a config collector plugin, something like the cisco router config one, and modifying it to store the nagios host config could be used to pass the config through nsca, although I'm not sure how you'd process that on the server to update the host config there. / / -- Steve Wilson -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring
My vote is to look at Multisite and livestatus from check_mk project. Dan From: Gerheim [mailto:wallacegerh...@gmail.com] Sent: Wednesday, March 30, 2011 8:01 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Hello folks, I'm new on Nagios and nagios-users mailing list. I was looking for addons for Nagios but i didn't find one wich attends me. Let me explain my scenario. If someone could help me, i ll be grateful. I have one nagios central with centreon. I have another nagios (worker), with centreon, wich i want to configure some hots and hostgroups that i don't want to configure on nagios central. But i want to monitor workers host and hotsgroups with nagios central. Summarizing, when i manipulate hosts on workers i don't want to put in nagios central. I've found NCSA, DNX and Gearman but it don't help. Could anyone? Thanks! Wallace Knopp de Menezes Gerheim -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring
Hi Wallace, On Wed, Mar 30, 2011 at 3:00 PM, Gerheim wallacegerh...@gmail.com wrote: I'm new on Nagios and nagios-users mailing list. I was looking for addons for Nagios but i didn't find one wich attends me. Let me explain my scenario. If someone could help me, i ll be grateful. I have one nagios central with centreon. I have another nagios (worker), with centreon, wich i want to configure some hots and hostgroups that i don't want to configure on nagios central. But i want to monitor workers host and hotsgroups with nagios central. Summarizing, when i manipulate hosts on workers i don't want to put in nagios central. The standard distributed configuration for Centreon is to have a central web server (Centreon Nagios) and pollers (Nagios). The whole configuration is define on the central server and you associate each host recorded on the correct Nagios engine. Then, with the access control list feature, you can easily choose to display or not hostgroup/host monitoring on the web side for each user connected. Best regards. -- Romain LE MERLUS -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring
Thanks everyone for reply. I think i wasn't correctly on my explanation ... My nagios central have a lot of objects (included nagios client objects). I want to manipulate only a few on nagios client. I'm looking to decentralize the scope of clients. For example: i have 3 clients. Each one are responsable to monitor some hosts. When they do that, they return the results to central. Clients have nagios and centreon too. Thanks Ps.: Sorry if i wasn't clear. I'm learning English. -- Wallace -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring and dependencies
Good day all. I've taken over ownership of a very hacked and mutilated nagios installation in house, and I'm busy building a migration plan and designing the new nagios instance. I have some questions which the documentation is not making apparent, likely due to my lack of understanding nagios, not the documentation. We have 3 physical locations which will be monitored, which will likely increase, and I'm looking at a distributed monitoring setup as described in the documentation here : http://nagios.sourceforge.net/docs/3_0/distributed.html Now the documentation mentions : The purpose of the central server is to simply listen for service check results from one or more distributed servers. Even though services are occassionally actively checked from the central server, the active checks are only performed in dire circumstances, so lets just say that the central server only accepts passive check for now We are also looking at using dependencies between hosts and services across all locations, which according to the documentation and my understanding of it, might be a problem. Execution dependencies are used to restrict when active checks of a service can be performed. Passive checks are not restricted by execution dependencies Unfortunately the check scheduling logic link is still in TODO status, so I cannot explore further. Is my understanding correct ? If not, can you use distributed monitoring and host and service dependencies ? As a final question. I'd like to be able to monitor a single host from the different locations to be able to identify links going down using the above configuration. Would I have to configure the host 3 times for each nagios server or is there a different way. Regards Henti -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring best practices
On 18 May 2010 15:21, Christoph Kluenter c...@iphh.net wrote: I am thinking about testing DNX ( dnx.sf.net ) But since one can't define which check will run on which node, we would have to reconfigure a lot of firewalls. Would dnx be worth this hassle ? Any experiences ? I'm interested too about it. Any suggest to completly centrilized monitoring? Thank's -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring
wrong 3d before, sorry On 6 May 2010 11:55, Enrico Zimol lomiz.m...@gmail.com wrote: Hi at all, I'm newbie on nagios and I'm writing here to ask you for suggestions abut how to structure my monitoring situation. I've to monitor linux servers for about 15/20 customers, from 1 to 5 server for each customer. We aren't on vpn with customers, so this servers are all behind NAT. That isn't a problem because we are the administrator of the firewall (other linux server) so we can manage any kind of DNAT and filter rule. I read on official documentation that suggest to use NCSA addon for distributed monitoring, but we choose to use NRPE addon for different motivations like: -customer force us to do that -the number of monitored servers for each customer will never grow up -the services to monitor for each server are the same (raid hw/sw, disk usage etc) -we need a completly centralized monitoring structure For last sentence I thought to use the arguments option on NRPE (yes, I read the SECURITY document). Besides, to solve the problem of NAT with NRPE I'll do DNAT on firewall and the port parameter on check_nrpe plugin (is there problems to do that? I did little tests but I prefear a confirm) To manage this structure I need to organized a well-formed config file structure on nagios server. I thinked to structure it like this obj--| |--templatelinuxserversgeneral.cfg | |--customer_1_directory|-templateserver.cfg | |-server1.cfg | |-server2.cfg | |-servern.cfg | |--customer_2_directory|-templateserver.cfg |-server1.cfg |-servern.cfg Where: -templatelinuxserversgeneral.cfg is a very basic template for server -customer_1_directory in wich there is 1 file for each customer's server -templateserver.cfg will use templatelinuxserversgeneral and will add more specific common variabiles for that customer's server like the public IPAddress that will be the same for each customer's server. -servern.cfg in wich there will be some very specific server variables like nrpe port (read up). What do you think? How can I organize that service-server combination? Thank's so much P.S. sorry for my bad english -- Enrico Zimol -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Disabling Checks
Hi, I'm facing the same issue. And I'd like to find a way to turn off the monitoring on the remote host from the web interface on the main server. Because operators are not abble to log on the remote monitoring servers. Does anyone know if there is an add-on abble to do this ? I think about Centreon but I didn't test it yet. Regards, Gaël. 2009/11/21 Andrew Libby ali...@xforty.com From my knowledge, you'll either need to log in to the server running the remote nagios instance and disable checks in the configuration, or turn notifications off at the instance running the web interface. Depending on your needs, it might seem a decent fit to simply turn off notifications yet allow the monitoring to continue. Andy Glynne Jones wrote: Hi, I'm in the process of setting up a distributed monitoring system and have hit on an issue where one of my operators wants to disable a check in the web interface but the check is actually being run on the remote distributed system. How can I disable the check on the remote host in this example? Thanks, Glynne -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- === xforty technologies Andrew Libby ali...@xforty.com http://xforty.com === -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Disabling Checks
Hi Gael, On Fri, Nov 27, 2009 at 11:05 AM, Gael Cheron gael.che...@free.fr wrote: I'm facing the same issue. And I'd like to find a way to turn off the monitoring on the remote host from the web interface on the main server. Because operators are not abble to log on the remote monitoring servers. Does anyone know if there is an add-on abble to do this ? I think about Centreon but I didn't test it yet. You are right, you can do that through Centreon web interface. The external commands are managed with ACL, so the users are allow or not to submit them. Then, they are sent to the Nagios core regardless Nagios is installed on the Centreon server or a remote poller. Best regards. -- Romain LE MERLUS | Directeur des projets rlemer...@merethis.com Tel. +33 (0)1 49 69 97 12 Mob. +33(0)6 85 05 02 82 MERETHIS est éditeur du logiciel Centreon. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Disabling Checks
From my knowledge, you'll either need to log in to the server running the remote nagios instance and disable checks in the configuration, or turn notifications off at the instance running the web interface. Depending on your needs, it might seem a decent fit to simply turn off notifications yet allow the monitoring to continue. Andy Glynne Jones wrote: Hi, I'm in the process of setting up a distributed monitoring system and have hit on an issue where one of my operators wants to disable a check in the web interface but the check is actually being run on the remote distributed system. How can I disable the check on the remote host in this example? Thanks, Glynne -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- === xforty technologies Andrew Libby ali...@xforty.com http://xforty.com === -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring Disabling Checks
Hi, I'm in the process of setting up a distributed monitoring system and have hit on an issue where one of my operators wants to disable a check in the web interface but the check is actually being run on the remote distributed system. How can I disable the check on the remote host in this example? Thanks, Glynne -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring question: obsess process flow
Sorry list, I treated my last post as an email and did not observe proper list-fu! Marc, Thank you for pointing me to the distributed doc and for your explanations. I feel I am very close. Right now if I run: /usr/local/nagios/libexec/eventhandlers/submit_check_result Athena 'PING' OK 'qwrweewfkljewfglkjwegjlwejglwjeglkjwegkwleg' From the distributed server it outputs correctly in the master server. I understand this to mean that the remote server is correct or close enough, and the NSCA pipe is correct. However, even with obsess and oscp_command set properly in the nagios.conf file on the distributed server the checks don't seem to be motivated to travel across. Do I need to setup a differen't service class with specific options to motivate the data across the NSCA pipe? What is the process flow of obsess? -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring solution - some questions
Marc Powell ha scritto in data 25/09/2009 14.14: It sounds like you're looking for Freshness Checks. It's discussed in the Distributed Monitoring documentation. Thank's Marc, Meanwhile I've read better the documentation, the freshness threshold does the trick. Thank's! Simon -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring - Freshness and Latency
Hi, I am having a problem regarding Latency. Here my Technical information: All O.S: Ubuntu 9.03 Server//Nagios3.0.6 Main Server DL-380 4GB RAM and Quad-Core 3.0Mhz My distributed Nagios3 Satellites are reporting Latency, although no CPU, Mem, Disk or other peaks are evident: Hosts: Check Execution Time: 0.03 sec 4.22 sec 1.009 sec Check Latency:0.01 sec 22.67 sec 3.166 sec Services: Check Execution Time: 0.04 sec 0.21 sec 0.099 sec Check Latency: 0.01 sec 20.18 sec 2.957 sec Now I am just starting to configure this Satellite, so I only have 29 Hosts and 153 Services. Due to the Latency on the Satellite, the freshness checks on the Main Nagios are triggering and this is not good, because my SNMP bandwidth plugins are making disasters on my graphs. Attached is an output of nagios3 -s /etc/nagios3/nagios.cfg Tried debugging but everything looks ok... but... what do I know, rite? Any help is greatly appreciated!! Regards, Harald Nagios 3.0.6 Copyright (c) 1999-2008 Ethan Galstad (http://www.nagios.org) Last Modified: 12-01-2008 License: GPL Timing information on object configuration processing is listed below. You can use this information to see if precaching your object configuration would be useful. Object Config Source: Config files (uncached) OBJECT CONFIG PROCESSING TIMES (* = Potential for precache savings with -u option) -- Read: 0.009900 sec Resolve: 0.000208 sec * Recomb Contactgroups: 0.20 sec * Recomb Hostgroups:0.03 sec * Dup Services: 0.001928 sec * Recomb Servicegroups: 0.06 sec * Duplicate:0.07 sec * Inherit: 0.83 sec * Recomb Contacts: 0.01 sec * Sort: 0.00 sec * Register: 0.001701 sec Free: 0.000185 sec TOTAL:0.014044 sec * = 0.002258 sec (16.08%) estimated savings RETENTION DATA TIMES -- Read and Process: 0.009276 sec TOTAL:0.009276 sec Timing information on configuration verification is listed below. CONFIG VERIFICATION TIMES (* = Potential for speedup with -x option) -- Object Relationships: 0.000556 sec Circular Paths: 0.11 sec * Misc: 0.000302 sec TOTAL:0.000869 sec * = 0.11 sec (1.3%) estimated savings EVENT SCHEDULING TIMES - Get service info:0.000955 sec Get host info info: 0.03 sec Get service params: 0.23 sec Schedule service times: 0.000986 sec Schedule service events: 0.000206 sec Get host params: 0.02 sec Schedule host times: 0.000132 sec Schedule host events:0.42 sec TOTAL: 0.002349 sec Projected scheduling information for host and service checks is listed below. This information assumes that you are going to start running Nagios with your current config files. HOST SCHEDULING INFORMATION --- Total hosts: 29 Total scheduled hosts: 29 Host inter-check delay method: SMART Average host check interval: 300.00 sec Host inter-check delay: 10.34 sec Max host check spread: 30 min First scheduled check: Thu Jan 1 01:00:00 1970 Last scheduled check:Thu Jan 1 01:00:00 1970 SERVICE SCHEDULING INFORMATION --- Total services: 153 Total scheduled services: 153 Service inter-check delay method: SMART Average service check interval: 271.76 sec Inter-check delay: 1.78 sec Interleave factor method: SMART Average services per host: 5.28 Service interleave factor: 6 Max service check spread: 30 min First scheduled check: Tue Jul 7 18:45:37 2009 Last scheduled check: Tue Jul 7 18:49:33 2009 CHECK PROCESSING INFORMATION Check result reaper interval: 10 sec Max concurrent service checks: Unlimited PERFORMANCE SUGGESTIONS --- I have no suggestions - things look okay. -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/blackberry___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v)
Re: [Nagios-users] Distributed Monitoring - Freshness and Latency
I found the problem. All International SNMP Bandwidth checks from check_snmp_int.pl which are giving Nagios Satellites latencies of up to 17 seconds and a minimum of 5 seconds. This is making the whole check queue go rocket high. I'll just have to go back to MRTG :( Or does anyone have a good SNMP option for bandwidth usage monitoring with pnp4nagios compatibility? Regards, Harald -Original Message- From: Harald Böhmecke harald.boehme...@bertelsmann.de To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring - Freshness and Latency Date: Tue, 07 Jul 2009 19:44:29 +0200 Hi, I am having a problem regarding Latency. Here my Technical information: All O.S: Ubuntu 9.03 Server//Nagios3.0.6 Main Server DL-380 4GB RAM and Quad-Core 3.0Mhz My distributed Nagios3 Satellites are reporting Latency, although no CPU, Mem, Disk or other peaks are evident: Hosts: Check Execution Time: 0.03 sec 4.22 sec 1.009 sec Check Latency:0.01 sec 22.67 sec 3.166 sec Services: Check Execution Time: 0.04 sec 0.21 sec 0.099 sec Check Latency: 0.01 sec 20.18 sec 2.957 sec Now I am just starting to configure this Satellite, so I only have 29 Hosts and 153 Services. Due to the Latency on the Satellite, the freshness checks on the Main Nagios are triggering and this is not good, because my SNMP bandwidth plugins are making disasters on my graphs. Attached is an output of nagios3 -s /etc/nagios3/nagios.cfg Tried debugging but everything looks ok... but... what do I know, rite? Any help is greatly appreciated!! Regards, Harald -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/blackberry ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/blackberry___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring Parents
Hi all, I currently have 1 Master Nagios Server and 4 Nagios Satellites which do the hard work. I have defined all Parents (dependencies) on the Master Server. Do I also need to define the Parents on the Satellites? Or will the Master Server (the one sending out Notifications) automatically define the Unreachable hosts by itself? Regards, Harald -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Parents
If you want the satellites to suppress host/service checks when hosts are unreachable, then yes. Otherwise, your central Nagios master will correctly suppress notifications (as it knows about the dependencies, and the satellites don't do notifications) On our system, Ive defined the dependencies on the satellites as well because I want to suppress checks of unreachables (as with Nagios 2.x it causes horrible latencies when a sector drops out). It's a bit messy though, as it requires host checks to be done on both master and satellite. Steve From: Harald Böhmecke [mailto:harald.boehme...@bertelsmann.de] Sent: Sunday, 5 July 2009 11:54 p.m. To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Parents Hi all, I currently have 1 Master Nagios Server and 4 Nagios Satellites which do the hard work. I have defined all Parents (dependencies) on the Master Server. Do I also need to define the Parents on the Satellites? Or will the Master Server (the one sending out Notifications) automatically define the Unreachable hosts by itself? Regards, Harald -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring Central Server no status changes
Using nagios 3.0.5 Distributed Monitoring setup Hosts and Services show updated status information but the status of the host or service does not change from up on the central server. Status on the distributed servers is reflected correctly in the web interface. Why might this be? thanks, Paul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Central Server no status changes
On Feb 25, 2009, at 11:51 AM, Paul Landauer wrote: Using nagios 3.0.5 Distributed Monitoring setup Hosts and Services show updated status information but the status of the host or service does not change from up on the central server. Status on the distributed servers is reflected correctly in the web interface. Why might this be? The status that the central service is receiving from the distributed server for each status is up or it's not receiving or processing the updates? Some things that will help get a better answer are -- - information about how you've architected your distributed setup (i.e are you using 2+ nagios instances with NSCA transporting between them, implemented as documented?) - example host and service definitions from both servers (complete definitions please) - example status information from both servers for an affected service - related nagios.log information from both servers - the contents of your check result submission script if it's not exactly like the documented one. Running nagios and/or NSCA in debug mode on the central server might provide additional information. -- Marc -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Central Server no status changes
Hi Paul, Please always respond on list so that others now, and in the future, can learn from your experience and so that you can benefit from the experience of others on the list. More below... On Feb 25, 2009, at 12:54 PM, Paul Landauer wrote: On Wed, 2009-02-25 at 12:06 -0600, Marc Powell wrote: I'm using 2 servers following the documentation at http://nagios.sourceforge.net/docs/3_0/distributed.html Thanks. - example host and service definitions from both servers (complete definitions please) Definitions are the same on both servers. Example host definition: define host{ use generic-host host_name surf alias Surf Control address ip_address_of_surf_is_here max_check_attempts 5 check_command check-host-alive check_interval 5 retry_interval 1 check_period24x7 contact_groups admins notification_interval 30 notification_period 24x7 notification_optionsd,u,r } Example Service Definitions (surf is a member of sunrise_windows_servers): define service{ use generic-service hostgroup_name sunrise_windows_servers service_description NSClient++ Version check_command check_nt!CLIENTVERSION } For future reference, these are not 'complete' since you use templates. There's lots of important information within those templates that's needed when troubleshooting as well. I expect that the definitions are indeed different between the servers when you take the templates into account otherwise your central server is doing active checks of the services in addition to receiving the passive checks, overwriting their results. (I don't think this is the problem). - related nagios.log information from both servers I included excerpts that I thought applied. If you'd like the whole log, let me know. Nagios.log for Distributed server: [1235575724] SERVICE ALERT: surf;Explorer;CRITICAL;HARD; 3;Explorer.exe: not running [1235575724] SERVICE NOTIFICATION: nagiosadmin;surf;Explorer;CRITICAL;notify-service-by- email;Explorer.exe: not running Nagios.log for Central Server: [1235575777] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;surf;Explorer;0;Explorer.exe: not running [1235575778] PASSIVE SERVICE CHECK: surf;Explorer;0;Explorer.exe: not running This is interesting and useful. As you can see, on your distributed server, the status is 3 (CRITICAL) but by the time NSCA dumps it into the command pipe on the central server, that has been translated to 0 (OK) by something in the process. This could be because nagios isn't passing the correct status code to your submission script, your submission script is not interpreting or passing it to send_nsca correctly or nsca on the receiving side isn't reading it correctly. - the contents of your check result submission script if it's not exactly like the documented one. printfcmd=/usr/bin/printf NscaBin=/usr/bin/send_nsca NscaCfg=/etc/nagios/send_nsca.cfg NagiosHost=I_have_the_ip_address_of_my_central_server_here # Fire the data off to the NSCA daemon using the send_nsca script $printfcmd %s\t%s\t%s\t%s\n $1 $2 $3 $4 | $NscaBin -H $NagiosHost -p 5 721 -c $NscaCfg To say whether this is correct or not I'd have to see your OCSP command definition. If you're using the $SERVICESTATE$ macro, then this is broken. send_nsca expects a numeric state code but $SERVICESTATE$ provides a grammatical code (OK, CRITICAL, etc). Normally that needs to be translated to the proper numeric by the submission script first but you can also use the $SERVICESTATEID$ macro instead to get the numeric code. My bets are on this being the problem. Running nagios and/or NSCA in debug mode on the central server might provide additional information. Let me know if you still want this to be done. Running NSCA in debug to see if it's receiving the 0 status code from the distributed machine would further narrow down the source of the problem. -- Marc -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Central Server no status changes
On Feb 25, 2009, at 12:54 PM, Paul Landauer wrote: On Wed, 2009-02-25 at 12:06 -0600, Marc Powell wrote: I'm using 2 servers following the documentation at http://nagios.sourceforge.net/docs/3_0/distributed.html Thanks. - example host and service definitions from both servers (complete definitions please) Definitions are the same on both servers. Example host definition: define host{ use generic-host host_name surf alias Surf Control address ip_address_of_surf_is_here max_check_attempts 5 check_command check-host-alive check_interval 5 retry_interval 1 check_period24x7 contact_groups admins notification_interval 30 notification_period 24x7 notification_optionsd,u,r } Example Service Definitions (surf is a member of sunrise_windows_servers): define service{ use generic-service hostgroup_name sunrise_windows_servers service_description NSClient++ Version check_command check_nt!CLIENTVERSION } For future reference, these are not 'complete' since you use templates. There's lots of important information within those templates that's needed when troubleshooting as well. I expect that the definitions are indeed different between the servers when you take the templates into account otherwise your central server is doing active checks of the services in addition to receiving the passive checks, overwriting their results. (I don't think this is the problem). - related nagios.log information from both servers I included excerpts that I thought applied. If you'd like the whole log, let me know. Nagios.log for Distributed server: [1235575724] SERVICE ALERT: surf;Explorer;CRITICAL;HARD; 3;Explorer.exe: not running [1235575724] SERVICE NOTIFICATION: nagiosadmin;surf;Explorer;CRITICAL;notify-service-by- email;Explorer.exe: not running Nagios.log for Central Server: [1235575777] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;surf;Explorer;0;Explorer.exe: not running [1235575778] PASSIVE SERVICE CHECK: surf;Explorer;0;Explorer.exe: not running This is interesting and useful. As you can see, on your distributed server, the status is 3 (CRITICAL) but by the time NSCA dumps it into the command pipe on the central server, that has been translated to 0 (OK) by something in the process. This could be because nagios isn't passing the correct status code to your submission script, your submission script is not interpreting or passing it to send_nsca correctly or nsca on the receiving side isn't reading it correctly. - the contents of your check result submission script if it's not exactly like the documented one. printfcmd=/usr/bin/printf NscaBin=/usr/bin/send_nsca NscaCfg=/etc/nagios/send_nsca.cfg NagiosHost=I_have_the_ip_address_of_my_central_server_here # Fire the data off to the NSCA daemon using the send_nsca script $printfcmd %s\t%s\t%s\t%s\n $1 $2 $3 $4 | $NscaBin -H $NagiosHost -p 5 721 -c $NscaCfg To say whether this is correct or not I'd have to see your OCSP command definition. If you're using the $SERVICESTATE$ macro, then this is broken. send_nsca expects a numeric state code but $SERVICESTATE$ provides a grammatical code (OK, CRITICAL, etc). Normally that needs to be translated to the proper numeric by the submission script first but you can also use the $SERVICESTATEID$ macro instead to get the numeric code. My bets are on this being the problem. Running nagios and/or NSCA in debug mode on the central server might provide additional information. Let me know if you still want this to be done. Running NSCA in debug to see if it's receiving the 0 status code from the distributed machine would further narrow down the source of the problem. -- Marc Marc, You are correct sir! I changed $SERVICESTATE$ to $SERVICESTATEID$ on the distributed server and the central server is updating properly. I imagine that I'll need to use $HOSTSTATEID$ instead of $HOSTSTATE$ as well. paul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct network connection
-Original Message- From: Andreas Ericsson [mailto:[EMAIL PROTECTED] Sent: 29 November 2008 14:01 To: Nick Lunt Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed monitoring without direct network connection Nick Lunt wrote: Hi folks nagios 3.0.5 on RHEL 4u6. We have nagios servers all over the uk and we want to get all alerts from each nagios server to a central nagios server at our main offices. We do not have permanent network connectivity to the remote nagios servers so using NSCA is not an option. Has anyone any idea of how to overcome this problem ? Queue the events that were unsendable and send them when it becomes possible. Merlin is designed to handle frequently failing links with sometimes extremely long downtimes (it already does this), but it's not really production level stable yet, so I wouldn't recommend using it for this (unless you're interested in completing it yourself or sponsoring me or op5 to do it for you, ofcourse). More about merlin at http://git.op5.org/git/nagios/merlin.git pnsca, another module available there, can probably be trivially rewritten to stash alerts and whatnot with very good performance. I am thinking of getting the remote nagios servers to send email alerts to an account on the central nagios server then trying to get an alert generated based on the contents of the email, has anyone tried this before ? Or does anyone have any better ideas for solving this problem ? That depends on what your end-goal is, really. Do you want only one server to send notifications, or do you want your central server to be able to generate reports from the data sent in from the slave systems? If only one server should send notifications, I'd recommend using a solution with lower latency that gathering everything and shipping it as an email. One-way UDP communication would be one solution here, I guess, but it does require the network to be physically present at all times (and there's no failure detection what so ever, as UDP is a fire-and-forget protocol). Merlin would help in this case (although it can't send over UDP yet). If it's for reporting reasons, you'd be better off sending the logfiles as emails when they're being rotated and then merging them together on the master server. That means you can't get *accurate* reports more often than the logs are rotated, but since you'll need to sort-merge them anyways, that's still going to be a problem. Neither merlin nor NSCA can help here, I'm afraid, as entries in the logs would get completely jumbled unless you sort-merge them before taking generating reports from them. Thanks for the detailed info Andreas. I still think the nagios event - email - nagios server is the only realistic solution. It's not perfect as mail servers can fail and mail can get delayed but it's the best we can do at the moment. Cheers, Nick. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct networkconnection
-Original Message- From: Nick Lunt [mailto:[EMAIL PROTECTED] Sent: Monday, December 01, 2008 6:01 AM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed monitoring without direct networkconnection -Original Message- From: Andreas Ericsson [mailto:[EMAIL PROTECTED] Sent: 29 November 2008 14:01 To: Nick Lunt Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed monitoring without direct network connection Nick Lunt wrote: Hi folks nagios 3.0.5 on RHEL 4u6. We have nagios servers all over the uk and we want to get all alerts from each nagios server to a central nagios server at our main offices. We do not have permanent network connectivity to the remote nagios servers so using NSCA is not an option. Has anyone any idea of how to overcome this problem ? Queue the events that were unsendable and send them when it becomes possible. Merlin is designed to handle frequently failing links with sometimes extremely long downtimes (it already does this), but it's not really production level stable yet, so I wouldn't recommend using it for this (unless you're interested in completing it yourself or sponsoring me or op5 to do it for you, ofcourse). More about merlin at http://git.op5.org/git/nagios/merlin.git pnsca, another module available there, can probably be trivially rewritten to stash alerts and whatnot with very good performance. I am thinking of getting the remote nagios servers to send email alerts to an account on the central nagios server then trying to get an alert generated based on the contents of the email, has anyone tried this before ? Or does anyone have any better ideas for solving this problem ? That depends on what your end-goal is, really. Do you want only one server to send notifications, or do you want your central server to be able to generate reports from the data sent in from the slave systems? If only one server should send notifications, I'd recommend using a solution with lower latency that gathering everything and shipping it as an email. One-way UDP communication would be one solution here, I guess, but it does require the network to be physically present at all times (and there's no failure detection what so ever, as UDP is a fire-and-forget protocol). Merlin would help in this case (although it can't send over UDP yet). If it's for reporting reasons, you'd be better off sending the logfiles as emails when they're being rotated and then merging them together on the master server. That means you can't get *accurate* reports more often than the logs are rotated, but since you'll need to sort-merge them anyways, that's still going to be a problem. Neither merlin nor NSCA can help here, I'm afraid, as entries in the logs would get completely jumbled unless you sort-merge them before taking generating reports from them. Thanks for the detailed info Andreas. I still think the nagios event - email - nagios server is the only realistic solution. It's not perfect as mail servers can fail and mail can get delayed but it's the best we can do at the moment. Cheers, Nick. - Email isn't horrible, but it's not optimal. We wrote a script that checks a pop account for email every morning to ensure the daily reports arrived as expected. -Mike - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct networkconnection
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Nick Lunt wrote: It's not that the connections will be up/down it's more that they simply won't be there. Most of our clients are NHS (hospitals) and we have to have a secure vpn connection that we dial into on an as needed basis. We currently just send alerts as emails to our support account, but the company is getting bigger and bigger so monitoring the support inbox is becoming a massive chore. We really want a central nagios server with the web frontend on a big flat screen on the wall :) Setup dedicated links. With VPN's this should not cost you an arm and a leg. You try to fix the wrong problem in my view. Hugo. - -- [EMAIL PROTECTED] http://hugo.vanderkooij.org/ PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc A: Yes. Q: Are you sure? A: Because it reverses the logical flow of conversation. Q: Why is top posting frowned upon? Bored? Click on http://spamornot.org/ and rate those images. Nid wyf yn y swyddfa ar hyn o bryd. Anfonwch unrhyw waith i'w gyfieithu. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFJMRJPBvzDRVjxmYERAnwBAJ9rcMw8B6wlRMPJ3aDVdFxRKwBmEgCgqG6d 2NOt2MFHKBW8p8iwJeftv3s= =BX9Z -END PGP SIGNATURE- - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct networkconnection
Hugo van der Kooij wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Nick Lunt wrote: It's not that the connections will be up/down it's more that they simply won't be there. Most of our clients are NHS (hospitals) and we have to have a secure vpn connection that we dial into on an as needed basis. We currently just send alerts as emails to our support account, but the company is getting bigger and bigger so monitoring the support inbox is becoming a massive chore. We really want a central nagios server with the web frontend on a big flat screen on the wall :) Setup dedicated links. With VPN's this should not cost you an arm and a leg. You try to fix the wrong problem in my view. Since the targeted organizations are hospitals, I think it's legal matters rather than competence that make dedicated links a showstopper. -- Andreas Ericsson [EMAIL PROTECTED] OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct network connection
Nick Lunt wrote: Hi folks nagios 3.0.5 on RHEL 4u6. We have nagios servers all over the uk and we want to get all alerts from each nagios server to a central nagios server at our main offices. We do not have permanent network connectivity to the remote nagios servers so using NSCA is not an option. Has anyone any idea of how to overcome this problem ? Queue the events that were unsendable and send them when it becomes possible. Merlin is designed to handle frequently failing links with sometimes extremely long downtimes (it already does this), but it's not really production level stable yet, so I wouldn't recommend using it for this (unless you're interested in completing it yourself or sponsoring me or op5 to do it for you, ofcourse). More about merlin at http://git.op5.org/git/nagios/merlin.git pnsca, another module available there, can probably be trivially rewritten to stash alerts and whatnot with very good performance. I am thinking of getting the remote nagios servers to send email alerts to an account on the central nagios server then trying to get an alert generated based on the contents of the email, has anyone tried this before ? Or does anyone have any better ideas for solving this problem ? That depends on what your end-goal is, really. Do you want only one server to send notifications, or do you want your central server to be able to generate reports from the data sent in from the slave systems? If only one server should send notifications, I'd recommend using a solution with lower latency that gathering everything and shipping it as an email. One-way UDP communication would be one solution here, I guess, but it does require the network to be physically present at all times (and there's no failure detection what so ever, as UDP is a fire-and-forget protocol). Merlin would help in this case (although it can't send over UDP yet). If it's for reporting reasons, you'd be better off sending the logfiles as emails when they're being rotated and then merging them together on the master server. That means you can't get *accurate* reports more often than the logs are rotated, but since you'll need to sort-merge them anyways, that's still going to be a problem. Neither merlin nor NSCA can help here, I'm afraid, as entries in the logs would get completely jumbled unless you sort-merge them before taking generating reports from them. -- Andreas Ericsson [EMAIL PROTECTED] OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring without direct network connection
Hi folks nagios 3.0.5 on RHEL 4u6. We have nagios servers all over the uk and we want to get all alerts from each nagios server to a central nagios server at our main offices. We do not have permanent network connectivity to the remote nagios servers so using NSCA is not an option. Has anyone any idea of how to overcome this problem ? I am thinking of getting the remote nagios servers to send email alerts to an account on the central nagios server then trying to get an alert generated based on the contents of the email, has anyone tried this before ? Or does anyone have any better ideas for solving this problem ? Kind Regards Nick Lunt Managed Services and O/S Analyst Patech Solutions Limited Tel: 01543 444 707 Fax: 01543 444 709 Tame House, Fradley Park, Lichfield, Staffordshire, WS13 8RZ www.patech-solutions.com http://www.patech-solutions.com/home.htm - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct network connection
Try looking at opsview , it is doing what you want with nagios as a component in his setup. http://www.opsview.org/ On Friday 28 November 2008 11:53:28 Nick Lunt wrote: Hi folks nagios 3.0.5 on RHEL 4u6. We have nagios servers all over the uk and we want to get all alerts from each nagios server to a central nagios server at our main offices. We do not have permanent network connectivity to the remote nagios servers so using NSCA is not an option. Has anyone any idea of how to overcome this problem ? I am thinking of getting the remote nagios servers to send email alerts to an account on the central nagios server then trying to get an alert generated based on the contents of the email, has anyone tried this before ? Or does anyone have any better ideas for solving this problem ? Kind Regards Nick Lunt Managed Services and O/S Analyst Patech Solutions Limited Tel: 01543 444 707 Fax: 01543 444 709 Tame House, Fradley Park, Lichfield, Staffordshire, WS13 8RZ www.patech-solutions.com http://www.patech-solutions.com/home.htm -- Assaf Flatto SSP Ops Team Linux System Administrator 169 Euston Road, London, NW1 2AE IMPORTANT . this email and the information in it may be confidential, legally privileged and/or protected by law. It is intended solely for the use of the person to whom it is addressed. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Please also delete all copies of this email and any attachments from your system. We cannot guarantee the security or confidentiality of email communications. We do not accept any liability for losses or damages that you may suffer as a result of your receipt of this email including but not limited to computer service or system failure, access delays or interruption, data non-delivery or mis-delivery, computer viruses or other harmful components. Copyright in this email and any attachments belong to Select Service Partner UK Limited. Should you communicate with anyone at Select Service Partner UK Limited by email, you consent to us monitoring and reading any such correspondence. Nothing in this email shall be taken or read as suggesting, proposing or relating to any agreement concerted practice or other practice that could infringe UK or EC competition legislation. Select Service Partner UK Limited is a company registered in England and Wales (company number 05687183) whose registered office is at 1 The Heights, Brooklands, Weybridge. Surrey. KT13 0NY - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct network connection
Thanks for the plug, but Opsview is not really suitable for this. We have distributed monitoring out-of-the-box, but it requires the permanent connection between slaves and the master which Nick says he hasn't got. If you will have temporary connections (that go up and down), we have been thinking whether we can do batched results from slaves, though this is quite a big job as it requires changes to a lot of components (Nagios, NSCA, NDOutils, and our datawarehouse). Let us know if you fancy sponsoring this work. Otherwise, why can't you just have notifications from the other servers? Do you need to correlate with your central server? Ton On 28 Nov 2008, at 14:20, Assaf Flatto wrote: Try looking at opsview , it is doing what you want with nagios as a component in his setup. http://www.opsview.org/ On Friday 28 November 2008 11:53:28 Nick Lunt wrote: Hi folks nagios 3.0.5 on RHEL 4u6. We have nagios servers all over the uk and we want to get all alerts from each nagios server to a central nagios server at our main offices. We do not have permanent network connectivity to the remote nagios servers so using NSCA is not an option. Has anyone any idea of how to overcome this problem ? I am thinking of getting the remote nagios servers to send email alerts to an account on the central nagios server then trying to get an alert generated based on the contents of the email, has anyone tried this before ? Or does anyone have any better ideas for solving this problem ? - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct networkconnection
Hi Ton thanks for the info. It's not that the connections will be up/down it's more that they simply won't be there. Most of our clients are NHS (hospitals) and we have to have a secure vpn connection that we dial into on an as needed basis. We currently just send alerts as emails to our support account, but the company is getting bigger and bigger so monitoring the support inbox is becoming a massive chore. We really want a central nagios server with the web frontend on a big flat screen on the wall :) Im setting up a filter in postfix on the central nagios server so that all emails coming in will go thru the filter, the filter will run a script to call send_nsca. So I'll have nagios clients - nagios - send_email - primary nagios server - postfix - mail filter - send_nsca - nagios If I get this working I'll treat myself to curry :) From: Ton Voon [mailto:[EMAIL PROTECTED] Sent: 28 November 2008 15:35 To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed monitoring without direct networkconnection Thanks for the plug, but Opsview is not really suitable for this. We have distributed monitoring out-of-the-box, but it requires the permanent connection between slaves and the master which Nick says he hasn't got. If you will have temporary connections (that go up and down), we have been thinking whether we can do batched results from slaves, though this is quite a big job as it requires changes to a lot of components (Nagios, NSCA, NDOutils, and our datawarehouse). Let us know if you fancy sponsoring this work. Otherwise, why can't you just have notifications from the other servers? Do you need to correlate with your central server? Ton - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct networkconnection
On 28 Nov 2008, at 16:00, Nick Lunt wrote: So I'll have nagios clients - nagios - send_email - primary nagios server - postfix - mail filter - send_nsca - nagios If I get this working I'll treat myself to curry :) You deserve a vindaloo. So basically, you are using email from the disparate nagios systems as a delivery mechanism to send state change data to the master. Can't see why that wouldn't work (its just like getting trap or log event data). Using emails provide you with resilience and retries, though high latency. Better hope there's not a network/email outage! Ton - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring without direct networkconnection
If I get this working I'll treat myself to curry :) Try a coconut milk + pineapple curry. Serve with ginger salad. Little closer to heaven. ~BAS- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] distributed monitoring host checking question
Hi, I am working on setting up a distributed monitoring system with Nagios (actually Groundwork). I have 3 child servers and 1 parent server, using NSCA to send passive check results from the children to the parent server. My question is about how Nagios (version 2.5) will behave when an on demand host check needs to be run. So for example: Host A is configured with check_host_alive ( a simple ping ) as its host check command on the parent server. It is also configured with Service A, say an SNMP check. Active host checks are not disabled on the parent server, but active service checks are. Host A, obviously, is also configured on the child server. When the child server sends a passive check result up to the parent saying that the SNMP check has failed, will the parent server then run the on-demand host check command to verify that Host A is still up? If not, how do I get that information up to the parent? Are passive host checks my only option? So I guess the question is this: In a distributed monitoring setup, will a parent server run an on-demand host check for a host that gets a report (via a passive service check sent from a child server) of a service being critical? Thanks, Tom -- - Tom Ammon Network Engineer Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring with distributed server running Windows
Hello all, I have Nagios up and running and have made myself familiar with nrpe and nsca. I have read and think I understand the Nagios distributed monitoring setup quite clearly. The issue I have now is that I need to setup a distributed monitoring setup where the distributed server will be running Windows. In short, I have a Linux server as the central server, I have a distributed monitoring server running Windows. I can monitor the distributed server itself quite well using nsca with nsclient++. I can also monitor it successfully using nsca with nsclient++. I have some more windows machines that I need to monitor but they cannot contact the central Nagios server directly, they have to come via the distributed server. Does anyone know of any tools that can enable my distributed server running Windows to act either as an nsca server to receive data from leaf nodes via nsca or as an nrpe server to get data from leaf nodes via nrpe and then send those alerts onto my central Linux box via nsca? Thanks in advance for any help, -- --Moby They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] distributed monitoring - slave server not that intelligent
mark redding wrote: Hi all, I currently have Nagios 2.10 installed on a couple of machines, one of which is configured as a master and the other as a slave. I have a script running on the slave which rsync's up the configs from the master and performs health checks of the master to see that it is running (and if it is not then it enables service checks/notifications on the slave until such time as it detects that the master is back up and running). I also use nsca to pass passive checks to the slave to ensure that it has up to date information about services. The slave does not perform any active service checks, nor are notifications enabled unless the master is down. I do however still have one problem and that is that the slave has no way of knowing when we're ack'ed a critical, scheduled downtime, disabled/enabled notfications/event handlers/checks for a service/host on the master. What this means is that if we schedule downtime on a host, then the master goes down, the slave starts bitching about the host that is down (because it does not know that it's in downtime). A similar problem occurs if we disable an event handler on the master, because unless the slave also knows to disable the event handler it will fire it (regardless of whether or not it is active) as soon as the passive check result returns a critical. At present I am getting round this by tailing the nagios log file through a perl script that looks for specific 'EXTERNAL COMMAND' entries and then flushes those through to the slave by ssh'ing to the slave and writing the command string to the nagios pipe file on the slave. Is there a better way of doing this ? You might get lucky using the attached NEB-module. It's not well documented, and it's not very well tested. It will do what you're after though. Contact me off-list if you run into problems. I've been looking for someone to test this for quite some time now, so I'll be happy to help. It's written to make the two servers loadbalanced, so the slave and the master will help each other out doing checks and then transmit them to one another. External commands are also copied from one to the other, so scheduled/cancelled downtime etc will instantly show up on both servers as soon as its parsed in one. If you don't want the host/service check syncing you'll have to either get clever with the config or manually hack that out of the module. Like I said; Feel free to contact me off-list if you're having any problems with it. -- Andreas Ericsson [EMAIL PROTECTED] OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 mrm-0.1.tar.gz Description: GNU Zip compressed data - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] distributed monitoring - slave server not that intelligent
Hi all, I currently have Nagios 2.10 installed on a couple of machines, one of which is configured as a master and the other as a slave. I have a script running on the slave which rsync's up the configs from the master and performs health checks of the master to see that it is running (and if it is not then it enables service checks/notifications on the slave until such time as it detects that the master is back up and running). I also use nsca to pass passive checks to the slave to ensure that it has up to date information about services. The slave does not perform any active service checks, nor are notifications enabled unless the master is down. I do however still have one problem and that is that the slave has no way of knowing when we're ack'ed a critical, scheduled downtime, disabled/enabled notfications/event handlers/checks for a service/host on the master. What this means is that if we schedule downtime on a host, then the master goes down, the slave starts bitching about the host that is down (because it does not know that it's in downtime). A similar problem occurs if we disable an event handler on the master, because unless the slave also knows to disable the event handler it will fire it (regardless of whether or not it is active) as soon as the passive check result returns a critical. At present I am getting round this by tailing the nagios log file through a perl script that looks for specific 'EXTERNAL COMMAND' entries and then flushes those through to the slave by ssh'ing to the slave and writing the command string to the nagios pipe file on the slave. Is there a better way of doing this ? -- bright blessings, Mark - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring with nrpe_nt and freshness
We have our monitoring configured and everything is working great checking all our windows servers through a single windows server running nrpe_nt. The problem we are having is when one of our Linux Nagios servers goes down and doesn't send any results to the master Nagios server. When this happens and our 5 minute freshness hits it's threshold. We start running active checks because we didn't receive any passive updates from the server that went down. This sends a bunch of checks to the windows server to run tests and we start getting unknown status reports back to the master server with the result of No output available from command. Does anyone know if there is a max connection on nrpe_nt or something else that maybe causing this? Thank you, Jeff - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring with nrpe_nt and freshness
Everything works fine checking the hosts if I force an active check for all services on a host. We are not doing host checks at all on your servers just service checks. The only time I have a problem is when the freshness threshold is reached and it tries to force a check on a lot of services at once. It is almost like nrpe_nt is only able to process a set amount of checks at one time. There is no resource issue at the time this is happening on the Nagios server and on the Windows server running the checks. Has anyone else had this problem? Thank you, Jeff -Original Message- From: Thomas Guyot-Sionnest [mailto:[EMAIL PROTECTED] Sent: Sunday, November 04, 2007 1:36 PM To: Jeff Shumard - DefenseWeb Technologies Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed monitoring with nrpe_nt and freshness -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/11/07 01:11 PM, Jeff Shumard - DefenseWeb Technologies wrote: We have our monitoring configured and everything is working great checking all our windows servers through a single windows server running nrpe_nt. The problem we are having is when one of our Linux Nagios servers goes down and doesn't send any results to the master Nagios server. When this happens and our 5 minute freshness hits it's threshold. We start running active checks because we didn't receive any passive updates from the server that went down. This sends a bunch of checks to the windows server to run tests and we start getting unknown status reports back to the master server with the result of No output available from command. Does anyone know if there is a max connection on nrpe_nt or something else that maybe causing this? While I can't answer your question, I can suggest using check_dummy to set an UNKNOWN status to hosts not monitored. Is especially make sense if some of the hosts can't be monitored directly from the central server. Also are you sure the central server is allowed to talk to your nrpe_nt (IP access list)? Thomas -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHLjsw6dZ+Kt5BchYRAq6pAKDHXC7fjtgFNNTQUnJXrDXJxMDKAQCfftsa OTu41Chzk37uyYHRCU3x+eM= =VZZn -END PGP SIGNATURE- - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring with nrpe_nt and freshness
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/11/07 01:11 PM, Jeff Shumard - DefenseWeb Technologies wrote: We have our monitoring configured and everything is working great checking all our windows servers through a single windows server running nrpe_nt. The problem we are having is when one of our Linux Nagios servers goes down and doesn't send any results to the master Nagios server. When this happens and our 5 minute freshness hits it's threshold. We start running active checks because we didn't receive any passive updates from the server that went down. This sends a bunch of checks to the windows server to run tests and we start getting unknown status reports back to the master server with the result of No output available from command. Does anyone know if there is a max connection on nrpe_nt or something else that maybe causing this? While I can't answer your question, I can suggest using check_dummy to set an UNKNOWN status to hosts not monitored. Is especially make sense if some of the hosts can't be monitored directly from the central server. Also are you sure the central server is allowed to talk to your nrpe_nt (IP access list)? Thomas -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHLjsw6dZ+Kt5BchYRAq6pAKDHXC7fjtgFNNTQUnJXrDXJxMDKAQCfftsa OTu41Chzk37uyYHRCU3x+eM= =VZZn -END PGP SIGNATURE- - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring with nrpe_nt and freshness
We have our monitoring configured and everything is working great checking all our windows servers through a single windows server running nrpe_nt. The problem we are having is when one of our Linux Nagios servers goes down and doesn't send any results to the master Nagios server. When this happens and our 5 minute freshness hits it's threshold. We start running active checks because we didn't receive any passive updates from the server that went down. This sends a bunch of checks to the windows server to run tests and we start getting unknown status reports back to the master server with the result of No output available from command. Does anyone know if there is a max connection on nrpe_nt or something else that maybe causing this? Thank you, Jeff - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring Freshness checking failing then recovering
Hi Sean, On Mon, 15 Oct 2007, Sean McAvoy wrote: On further investigations it looks as though the problem is with the time taken to submit the results back to nagios via send_nsca. I have read about a couple different options for getting results back quickly. One being a bulk system of transfer, a file containing the results is sent via a send_nsca bulk transfer executed via cron. The other being a system that makes use of the performance data output option on the remote nagios systems and submits the results using a custom daemon on both ends. Does anybody know of any other options? Also, is there any guides to setting up either of these options, most of what I have read is email threads.. Thanks. On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote: Hello, I have 1 central nagios system with 5 distributed servers. I have enabled freshness checking on both central and remote systems. I am constantly seeing services go to unknown status for 1-3 minutes and then recover. on the remotes I have: check_service_freshness=1 service_freshness_check_interval=10 check_host_freshness=1 host_freshness_check_interval=60 service_inter_check_delay_method=s max_service_check_spread=10 service_interleave_factor=1 host_inter_check_delay_method=s max_host_check_spread=30 max_concurrent_checks=0 It does appear as though checks are being run in parallel. I'm wonder how I can best determine where the problem is, with the execution of checks, submittal to the central system or other. Thanks. _sean -- --- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null Sean McAvoy NOC Acting Team Lead Afilias Canada P. 416.673.4194 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null This may be the caching possibility you have already mentioned, but here is a blog posting about caching send_nsca: http://altinity.blogs.com/dotorg/2006/11/caching_nsca_da.html This is in the back of my mind for us down the road, but I have not looked into it personally, just seen the post. I have just started looking at what Opsview has to offer. Thanks, Ivan. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering
Sean; I have a very large deployment so I use this tool: http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon This daemon runs on each of the distributed servers while a normal ncsa daemon listens on the central server. Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Sean McAvoy Sent: Monday, October 15, 2007 12:09 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering On further investigations it looks as though the problem is with the time taken to submit the results back to nagios via send_nsca. I have read about a couple different options for getting results back quickly. One being a bulk system of transfer, a file containing the results is sent via a send_nsca bulk transfer executed via cron. The other being a system that makes use of the performance data output option on the remote nagios systems and submits the results using a custom daemon on both ends. Does anybody know of any other options? Also, is there any guides to setting up either of these options, most of what I have read is email threads.. Thanks. On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote: Hello, I have 1 central nagios system with 5 distributed servers. I have enabled freshness checking on both central and remote systems. I am constantly seeing services go to unknown status for 1-3 minutes and then recover. on the remotes I have: check_service_freshness=1 service_freshness_check_interval=10 check_host_freshness=1 host_freshness_check_interval=60 service_inter_check_delay_method=s max_service_check_spread=10 service_interleave_factor=1 host_inter_check_delay_method=s max_host_check_spread=30 max_concurrent_checks=0 It does appear as though checks are being run in parallel. I'm wonder how I can best determine where the problem is, with the execution of checks, submittal to the central system or other. Thanks. _sean -- --- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null Sean McAvoy NOC Acting Team Lead Afilias Canada P. 416.673.4194 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering
Hi Jonathan, Why not use check_by_ssh instead? Is there any pitfall (weakness) in using check_by_ssh compared agent like OCP? Thanks Sam - Original Message From: Jonathan Call [EMAIL PROTECTED] To: Sean McAvoy [EMAIL PROTECTED]; nagios-users@lists.sourceforge.net Sent: Wednesday, October 17, 2007 7:19:46 AM Subject: Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering Sean; I have a very large deployment so I use this tool: http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon This daemon runs on each of the distributed servers while a normal ncsa daemon listens on the central server. Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Sean McAvoy Sent: Monday, October 15, 2007 12:09 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering On further investigations it looks as though the problem is with the time taken to submit the results back to nagios via send_nsca. I have read about a couple different options for getting results back quickly. One being a bulk system of transfer, a file containing the results is sent via a send_nsca bulk transfer executed via cron. The other being a system that makes use of the performance data output option on the remote nagios systems and submits the results using a custom daemon on both ends. Does anybody know of any other options? Also, is there any guides to setting up either of these options, most of what I have read is email threads.. Thanks. On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote: Hello, I have 1 central nagios system with 5 distributed servers. I have enabled freshness checking on both central and remote systems. I am constantly seeing services go to unknown status for 1-3 minutes and then recover. on the remotes I have: check_service_freshness=1 service_freshness_check_interval=10 check_host_freshness=1 host_freshness_check_interval=60 service_inter_check_delay_method=s max_service_check_spread=10 service_interleave_factor=1 host_inter_check_delay_method=s max_host_check_spread=30 max_concurrent_checks=0 It does appear as though checks are being run in parallel. I'm wonder how I can best determine where the problem is, with the execution of checks, submittal to the central system or other. Thanks. _sean -- --- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null Sean McAvoy NOC Acting Team Lead Afilias Canada P. 416.673.4194 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users
[Nagios-users] Distributed monitoring Freshness checking failing then recovering
Hello, I have 1 central nagios system with 5 distributed servers. I have enabled freshness checking on both central and remote systems. I am constantly seeing services go to unknown status for 1-3 minutes and then recover. on the remotes I have: check_service_freshness=1 service_freshness_check_interval=10 check_host_freshness=1 host_freshness_check_interval=60 service_inter_check_delay_method=s max_service_check_spread=10 service_interleave_factor=1 host_inter_check_delay_method=s max_host_check_spread=30 max_concurrent_checks=0 It does appear as though checks are being run in parallel. I'm wonder how I can best determine where the problem is, with the execution of checks, submittal to the central system or other. Thanks. _sean - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Web Interface Issue
Hi Marco, I will set this up. Thanks a lot! Simon From: Marco Supino [mailto:[EMAIL PROTECTED] Sent: May-09-07 1:43 AM To: Simon Marcil; nagios-users@lists.sourceforge.net Subject: RE: [Nagios-users] Distributed Monitoring Web Interface Issue Hi, I have the same scenario, and what I did was to enable active checks on all services, but put check_period to none, so a check is never executed, except if freshness checking runs it. Marco. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon Marcil Sent: Wednesday, May 09, 2007 02:59 To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Web Interface Issue I have a distributed monitoring setup. I have several servers reporting back to a central server. The central server also does a couple checks but most of it's hosts and services are disabled (because it receives the info from other servers). The problem I have is with the web interface. In the Tactical Overview all the problems reported from distributed servers show up as disabled. This means that we can't have a correct listing of Unhandled Problems. For example, Let's say I have 3 hosts down coming from a distributed server with 1 that has been acknowledged. I will have the following: 3 Down 1 Acknowledged 3 Disabled In this example, is there a way to only list the host which are down and not acknowledged??? If this wasn't clear let me know and I will clearify. Simon - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Web Interface Issue
Marco, I have the same issue with all my services showing up as disabled because I have active checks turned off on my Centralized Nagios Interface. I and running Nagios 2.9 and I configured what you said but that didn't fix the problem it just caused a couple of others. Here is what I did bellow. 1) I didn't configure the service to have active_checks on and had no check_period configured. This did resolve the issue of the service saying disabled because the active check was turned on. This caused another problem. The active checks were being done after 30 seconds way before my freshness_threshold of 600 seconds and my normal_check_interval of 3 minutes. It shouldn't have checked it at all. 2) I tried it another way of creating a check_period called none which had not times configured to check. I made the service use this as its check_period. When I did this it then never ran an active check even though I had a freshness_threshold configured. Is there something I did wrong, or are you running an older version of Nagios then 2.9? If anyone else has found a resolution to this problem I would appreciate your comments. Thank you, Jeff From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marco Supino Sent: Tuesday, May 08, 2007 10:43 PM To: Simon Marcil; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed Monitoring Web Interface Issue Hi, I have the same scenario, and what I did was to enable active checks on all services, but put check_period to none, so a check is never executed, except if freshness checking runs it. Marco. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon Marcil Sent: Wednesday, May 09, 2007 02:59 To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Web Interface Issue I have a distributed monitoring setup. I have several servers reporting back to a central server. The central server also does a couple checks but most of it's hosts and services are disabled (because it receives the info from other servers). The problem I have is with the web interface. In the Tactical Overview all the problems reported from distributed servers show up as disabled. This means that we can't have a correct listing of Unhandled Problems. For example, Let's say I have 3 hosts down coming from a distributed server with 1 that has been acknowledged. I will have the following: 3 Down 1 Acknowledged 3 Disabled In this example, is there a way to only list the host which are down and not acknowledged??? If this wasn't clear let me know and I will clearify. Simon - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Web Interface Issue
Hi, You are right, I also modified a source file, allowing freshness to run even in check_period=none, this is the patch, if a service has check freshness, it will run it. Marco. [EMAIL PROTECTED]:~$ diff -Naur /tmp/new/nagios-2.8/base/checks.c /tmp/nagios-2.8/base/checks.c --- /tmp/new/nagios-2.8/base/checks.c 2007-03-01 14:15:10.0 -0500 +++ /tmp/nagios-2.8/base/checks.c 2007-03-13 04:10:46.0 -0400 @@ -1732,8 +1732,8 @@ if(temp_service-is_being_freshened==TRUE) continue; - /* see if the time is right... */ - if(check_time_against_period(current_time,temp_service-check_period)==E RROR) + /* see if the time is right... but we're using auto-freshness threshold */ + if(check_time_against_period(current_time,temp_service-check_period)==E RROR temp_service-check_freshness==FALSE) continue; /* EXCEPTION */ @@ -1741,6 +1741,7 @@ if(temp_service-check_interval==0 temp_service-freshness_threshold==0) continue; + #ifdef TEST_FRESHNESS printf(CHECKFRESHNESS 3\n); #endif From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Shumard - DefenseWeb Technologies Sent: Wednesday, May 09, 2007 20:58 To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed Monitoring Web Interface Issue Marco, I have the same issue with all my services showing up as disabled because I have active checks turned off on my Centralized Nagios Interface. I and running Nagios 2.9 and I configured what you said but that didn't fix the problem it just caused a couple of others. Here is what I did bellow. 1) I didn't configure the service to have active_checks on and had no check_period configured. This did resolve the issue of the service saying disabled because the active check was turned on. This caused another problem. The active checks were being done after 30 seconds way before my freshness_threshold of 600 seconds and my normal_check_interval of 3 minutes. It shouldn't have checked it at all. 2) I tried it another way of creating a check_period called none which had not times configured to check. I made the service use this as its check_period. When I did this it then never ran an active check even though I had a freshness_threshold configured. Is there something I did wrong, or are you running an older version of Nagios then 2.9? If anyone else has found a resolution to this problem I would appreciate your comments. Thank you, Jeff From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marco Supino Sent: Tuesday, May 08, 2007 10:43 PM To: Simon Marcil; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Distributed Monitoring Web Interface Issue Hi, I have the same scenario, and what I did was to enable active checks on all services, but put check_period to none, so a check is never executed, except if freshness checking runs it. Marco. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon Marcil Sent: Wednesday, May 09, 2007 02:59 To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Web Interface Issue I have a distributed monitoring setup. I have several servers reporting back to a central server. The central server also does a couple checks but most of it's hosts and services are disabled (because it receives the info from other servers). The problem I have is with the web interface. In the Tactical Overview all the problems reported from distributed servers show up as disabled. This means that we can't have a correct listing of Unhandled Problems. For example, Let's say I have 3 hosts down coming from a distributed server with 1 that has been acknowledged. I will have the following: 3 Down 1 Acknowledged 3 Disabled In this example, is there a way to only list the host which are down and not acknowledged??? If this wasn't clear let me know and I will clearify. Simon - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring Web Interface Issue
I have a distributed monitoring setup. I have several servers reporting back to a central server. The central server also does a couple checks but most of it's hosts and services are disabled (because it receives the info from other servers). The problem I have is with the web interface. In the Tactical Overview all the problems reported from distributed servers show up as disabled. This means that we can't have a correct listing of Unhandled Problems. For example, Let's say I have 3 hosts down coming from a distributed server with 1 that has been acknowledged. I will have the following: 3 Down 1 Acknowledged 3 Disabled In this example, is there a way to only list the host which are down and not acknowledged??? If this wasn't clear let me know and I will clearify. Simon - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring Web Interface Issue
Hi, I have the same scenario, and what I did was to enable active checks on all services, but put check_period to none, so a check is never executed, except if freshness checking runs it. Marco. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon Marcil Sent: Wednesday, May 09, 2007 02:59 To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Web Interface Issue I have a distributed monitoring setup. I have several servers reporting back to a central server. The central server also does a couple checks but most of it's hosts and services are disabled (because it receives the info from other servers). The problem I have is with the web interface. In the Tactical Overview all the problems reported from distributed servers show up as disabled. This means that we can't have a correct listing of Unhandled Problems. For example, Let's say I have 3 hosts down coming from a distributed server with 1 that has been acknowledged. I will have the following: 3 Down 1 Acknowledged 3 Disabled In this example, is there a way to only list the host which are down and not acknowledged??? If this wasn't clear let me know and I will clearify. Simon - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring : Monitoring server sending alert .
Hi Folks , I have a distribute nagios configuration running well . Except that the monitoring server start to send notification either the configuration is set to no in this monitoring . ( enable_notifications=0 ) I using Nagios 1.2 with SUSE 9 . Another questions. Is it any way to do host passive monitoring in nagios 1.x? I would not like to use active monitoring in my central server . best regards, Saulo Silva - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring
Dears, I have one nagios server working in my company, and I need to add another nagios server to monitor another servers in other subnets, I don't know if there's any solution to have 2 nagios servers(1 central nagios) and 1 monitor screen... it's mean the second server will send all check results to the central nagios. Thank you in advance Moayad Mohammad - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring
Yes you can. Do a google for distributed monitoring. Basically it caomes down to having the one nagios configured as a slave. It then passes all its info to the main Nagios. The services that are monitored on the slave nagios, are configured as passive on the main Nagios., but the data is still displayed with all the other active check-data on the same screen. One thing that I did: Because the data from the slave gets passed on NOT on request but passively, if the slave nagios dies, the main will not know about it. So my slave-nagios is actively checked with a ping by the main-nagios. John -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Moayad Mohammad Sent: 22 January 2007 11:29 To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Dears, I have one nagios server working in my company, and I need to add another nagios server to monitor another servers in other subnets, I don't know if there's any solution to have 2 nagios servers(1 central nagios) and 1 monitor screen... it's mean the second server will send all check results to the central nagios. Thank you in advance Moayad Mohammad - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring - Redundancy
I'm running Nagios is a distributed environment which is working very well. I would like to add a little redundancy to the picture now that I have everything working. ;-) ... It seems that a secondary cold spare might be the best solution. Then there are maintenance issues with keeping software up to data, etc. No - look at linux HA (heartbeat) and drbd. So many problems, so little beer. The linux HA/drdb setup is well understood and quite easy. We use linux-HA here to have a redundant setup of two servers. In fact, we are running our Nagios on one and our MRTG on the other, and they both provide failover for each other. They both pass between each other a set of virtual IPs, services, disks and filesystems. Works very well, and is very reliable. I uses the v1.x linux HA (trather than the newer feature-rich v2.x) as we only have a 2-machine failover cluster and simplicity makes things easier. We have an external SCSI disk pack connected to two adaptec serveRAID cards (these helpfully have locking capabilities for just this setup). There are two LUNs on the external pack passed between the servers. Heartbeat goes via serial cable, crossover network cable, and the main network. For people who are really paranoid, I also have a little linux-ha plugin which uses a tiny raw partition on the disk to effect an additional lock before mounting the filesystem. In a failover situation, we lose only about 30 seconds and everything is fine. Nagios (since it uses text files) is very stable - however, I also run mysql on the Nagios server to hold archives and summarised logs, and this passes back and forth with no difficulty as well. If anyone would like detailed instructions, please contact me directly. Steve Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring - Redundancy
On Fri, 2006-06-23 at 09:51 -0700, Mike Koponick wrote: Hello Everyone, I’m running Nagios is a distributed environment which is working very well. I would like to add a little redundancy to the picture now that I have everything working. ;-) Since I’m running a distributed environment, how can I add a secondary “Central-Server” to the picture? I’m not worried about the sensors or remote Nagios servers, just the central portion of the network. The problems that I see are as follows: The remote servers send data via NSCA to the central server. Would they also have to send a second connection to the secondary server? One way of doing it. NDO now sends data to my MySQL server, will the secondary server also need to send data? This opens a can of worms in terms of duplicate data, etc. You'd need to replicate this as well. It seems that a secondary ”cold spare” might be the best solution. Then there are maintenance issues with keeping software up to data, etc. No - look at linux HA (heartbeat) and drbd. So many problems, so little beer. The linux HA/drdb setup is well understood and quite easy. Greg Thanks in advance, Mike Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring - Redundancy
Title: Distributed Monitoring - Redundancy Hello Everyone, Im running Nagios is a distributed environment which is working very well. I would like to add a little redundancy to the picture now that I have everything working. ;-) Since Im running a distributed environment, how can I add a secondary Central-Server to the picture? Im not worried about the sensors or remote Nagios servers, just the central portion of the network. The problems that I see are as follows: The remote servers send data via NSCA to the central server. Would they also have to send a second connection to the secondary server? NDO now sends data to my MySQL server, will the secondary server also need to send data? This opens a can of worms in terms of duplicate data, etc. It seems that a secondary cold spare might be the best solution. Then there are maintenance issues with keeping software up to data, etc. So many problems, so little beer. Thanks in advance, Mike Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed Monitoring - Redundancy
-Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Mike Koponick Sent: Friday, June 23, 2006 11:52 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring - Redundancy Hello Everyone, I'm running Nagios is a distributed environment which is working very well. I would like to add a little redundancy to the picture now that I have everything working. ;-) The remote servers send data via NSCA to the central server. Would they also have to send a second connection to the secondary server? Yup. Easy enough to add additional calls to send_nsca in submit_check_result ala -- /bin/echo -e $1\t$2\t$return_code\t$4\n | /usr/local/nagios/bin/send_nsca host1 -p 5668 -c /usr/local/nagios/etc/send_nsca.cfg /bin/echo -e $1\t$2\t$return_code\t$4\n | /usr/local/nagios/bin/send_nsca host2 -p 5668 -c /usr/local/nagios/etc/send_nsca.cfg /bin/echo -e $1\t$2\t$return_code\t$4\n | /usr/local/nagios/bin/send_nsca host2 -p 5669 -c /usr/local/nagios/etc/send_nsca.cfg (yes, I send results to 3 different Nagios installations) NDO now sends data to my MySQL server, will the secondary server also need to send data? This opens a can of worms in terms of duplicate data, etc. I don't use NDO yet but I can imagine that you would experience duplication of data unless you had a different DB for your secondary host and reconciled them some other way. -- Marc Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed Monitoring
Ive setup a distributed monitoring server. One issue Im seeing is that the distributed server only updates the central server every 4-6 minutes. I have service checks running every 90 seconds on the distributed server. I have it set to obsess over services. Is there any way to adjust how often the send_nsca utility is actually ran, or adjust how often the distributed server updates the central server? I have freshness turned on, and it always wants to go out and get the results, because it thinks they are stale after 2-3 min. (threshold set to 450sec). But this creates double traffic, and kind of defeats the reason for distributed monitoring. Thank You, Matt
RE: [Nagios-users] Distributed Monitoring
Resultswill be getting sent toyour ocsp commandevery time a check result comes back on the distributed server if it's obsessing. Are you sure the checks are running every 90 seconds? Or, have you set a long command_check_interval in nagios.cfg? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of InnovationsTech, Matthew ThomasSent: Wednesday, April 12, 2006 5:06 PMTo: nagios-users@lists.sourceforge.netSubject: [Nagios-users] Distributed Monitoring Ive setup a distributed monitoring server. One issue Im seeing is that the distributed server only updates the central server every 4-6 minutes. I have service checks running every 90 seconds on the distributed server. I have it set to obsess over services. Is there any way to adjust how often the send_nsca utility is actually ran, or adjust how often the distributed server updates the central server? I have freshness turned on, and it always wants to go out and get the results, because it thinks they are stale after 2-3 min. (threshold set to 450sec). But this creates double traffic, and kind of defeats the reason for distributed monitoring. Thank You, Matt
Re: [Nagios-users] Distributed Monitoring
On Wed, Apr 12, 2006 at 08:06:08PM -0400, InnovationsTech, Matthew Thomas wrote: ran, or adjust how often the distributed server updates the central server? If you are obsessing over services, then send_nsca is called for each and every service check. I have freshness turned on, and it always wants to go out and get the results, because it thinks they are stale after 2-3 min. (threshold set to 450sec). But this creates double traffic, and kind of defeats the reason for distributed monitoring. It sounds like send_nsca is not actually succeeding in getting the data to the central server. -Jason Martin Thank You, Matt -- All stressed out, and no one to choke... This message is PGP/MIME signed. pgpLuPCZLeNtu.pgp Description: PGP signature
RE: [Nagios-users] Distributed Monitoring
Below is snippets from configuration. Is there a way to debug send_nsca ? I tried snoop and the port, tcpdump and the port, or tail on the nagios.log file and I dont see when its submitting the results. But according to the website, they are updating every 4-6 minutes. Nagios.cfg command_check_interval=-1 interval_length=30 log_external_commands=1 log_passive_checks=1 services.cfg check_period 24x7 max_check_attempts 2 normal_check_interval 3 90 Seconds per check retry_check_interval 1 30 second till retry on soft fail Thanks for the assistance. From: Morris, Patrick [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 12, 2006 20:29 To: InnovationsTech, Matthew Thomas; nagios-users@lists.sourceforge.net Subject: RE: [Nagios-users] Distributed Monitoring Resultswill be getting sent toyour ocsp commandevery time a check result comes back on the distributed server if it's obsessing. Are you sure the checks are running every 90 seconds? Or, have you set a long command_check_interval in nagios.cfg? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of InnovationsTech, Matthew Thomas Sent: Wednesday, April 12, 2006 5:06 PM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Distributed Monitoring Ive setup a distributed monitoring server. One issue Im seeing is that the distributed server only updates the central server every 4-6 minutes. I have service checks running every 90 seconds on the distributed server. I have it set to obsess over services. Is there any way to adjust how often the send_nsca utility is actually ran, or adjust how often the distributed server updates the central server? I have freshness turned on, and it always wants to go out and get the results, because it thinks they are stale after 2-3 min. (threshold set to 450sec). But this creates double traffic, and kind of defeats the reason for distributed monitoring. Thank You, Matt
[Nagios-users] Distributed monitoring problem
Hello all, I'm trying to setup a distributed monitoring system. At the start all looked fine too me, but now I'm having some problems on not receiving all passive checks from other hosts. The machine is a Intel(R) Xeon(TM) CPU 2.40GHz system with 512 MB RAM. The load is minimal. The only strange thing I can see is the memory settings: nagios:/etc/nagios # cat /proc/meminfo MemTotal: 514264 kB MemFree: 30192 kB Buffers: 44568 kB Cached: 328004 kB SwapCached: 8 kB Active: 264908 kB Inactive: 137824 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 514264 kB LowFree: 30192 kB SwapTotal: 1028120 kB SwapFree: 1028020 kB Dirty: 780 kB Writeback: 0 kB Mapped: 46188 kB Slab:75556 kB Committed_AS: 100992 kB PageTables: 1104 kB VmallocTotal: 507896 kB VmallocUsed: 7264 kB VmallocChunk: 499760 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 4096 kB The process info tells me this: Time Frame Checks Completed = 1 minute:51 (16.6%) = 5 minutes: 221 (71.8%) = 15 minutes: 255 (82.8%) = 1 hour: 260 (84.4%) Since program start:261 (84.7%) So it's receiving less then 85% of all checks :( There will be more passive checks to be send to this nagios server. Do we need other hardware ? Where do I need to look to solve this problem ? The machines sending the passive check info are not too busy doing this, the checks are seperated over three different servers. One example... This is /var/log/nagios/nagios.log: [1135162484] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;cat29-w11-backup;PING;0;PING OK - Packet loss = 0%, RTA = 0.89 ms[1135162491] SERVICE ALERT: cat29-w11-backup;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 0.89 ms [1135162491] SERVICE NOTIFICATION: nagios;cat29-w11-backup;PING;OK;notify-by-epager;PING OK - Packet loss = 0%, RTA = 0.89 ms[1135162491] SERVICE NOTIFICATION: nagios;cat29-w11-backup;PING;OK;notify-by-email;PING OK - Packet loss = 0%, RTA = 0.89 ms [1135162941] Warning: The results of service 'PING' on host 'cat29-w11-backup' are stale by 32 seconds (threshold=425 seconds). I'm forcing an immediate check of the service. [1135162951] SERVICE ALERT: cat29-w11-backup;PING;CRITICAL;SOFT;1;CRITICAL: Service results are stale! It looks like its stale again too fast ? Can somebody please help me :) Best regards, Rob Hassing --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37alloc_id865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring problem
On Wed, 2005-12-21 at 12:08 +0100, Rob Hassing wrote: Hello all, Hi Rob, I'm trying to setup a distributed monitoring system. At the start all looked fine too me, but now I'm having some problems on not receiving all passive checks from other hosts. Distributed monitoring is waaay cool. :) The only thing that could lead to a issue is that CGIs that come with web-interface don't scale very well. Here we ended up with a MySQL storing status with NEB-module. We are now testing GroundWork's framework. It appears to fit our needs. Only the config files generator we developed in-house, to properly setup all distributed agents, storing all config on a database. The machine is a Intel(R) Xeon(TM) CPU 2.40GHz system with 512 MB RAM. The process info tells me this: Time FrameChecks Completed = 1 minute: 51 (16.6%) = 5 minutes: 221 (71.8%) = 15 minutes:255 (82.8%) = 1 hour:260 (84.4%) Since program start: 261 (84.7%) Here is what we have: = 1 minute:2383 (21.3%) = 5 minutes:6138 (54.7%) = 15 minutes:8321 (74.2%) = 1 hour:10138 (90.4%) Since program start: 10711 (95.5%) So it's receiving less then 85% of all checks :( There will be more passive checks to be send to this nagios server. Do we need other hardware ? Where do I need to look to solve this problem ? To avoid staled services, you need to setup freshness_threshold properly for your services. Here is your hint, setting up freshness_threshold is something a little strange as we need to wait for the packet to arrive with the check result, and the less services you have it configured, letting Nagios calculates it, the better. But it is the only thing to configure to avoid staling services results. We decided to make staled results to appear in an Unknown status, because this could be only some traffic issue along the packet way caused by backup/restore routines, high traffic load, among other things that could cause such staling. The machines sending the passive check info are not too busy doing this, the checks are seperated over three different servers. Here we have 11 distributed servers, sending check results via send_nsca and they have around 2k services configured at each one. All sparc servers sending to a SuSE9.3 box on commoditie hardware. This linux machine has 2GRAM, and some SATA disks. It is a P4-HT. One example... This is /var/log/nagios/nagios.log: [1135162484] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;cat29-w11-backup;PING;0;PING OK - Packet loss = 0%, RTA = 0.89 ms[1135162491] SERVICE ALERT: cat29-w11-backup;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 0.89 ms [1135162491] SERVICE NOTIFICATION: nagios;cat29-w11-backup;PING;OK;notify-by-epager;PING OK - Packet loss = 0%, RTA = 0.89 ms[1135162491] SERVICE NOTIFICATION: nagios;cat29-w11-backup;PING;OK;notify-by-email;PING OK - Packet loss = 0%, RTA = 0.89 ms [1135162941] Warning: The results of service 'PING' on host 'cat29-w11-backup' are stale by 32 seconds (threshold=425 seconds). I'm forcing an immediate check of the service. [1135162951] SERVICE ALERT: cat29-w11-backup;PING;CRITICAL;SOFT;1;CRITICAL: Service results are stale! It looks like its stale again too fast ? Well, those last two lines don't indicate two staled services. The first line which tells you the freshness_threshold indicates that Central Nagios waited for 425 seconds and the result of the Active check arrived 32 seconds later. The last line, is indicating the Active Check being processed by Central Nagios. Then it appears as a critical alert on web-interface. The active check stale_service.sh or whatever line you place there is processed. (it can be the real check, thus Central Nagios will be actively checking on staled results, but this will cause some load troubles :) HTH Regards, -- Marcel Mitsuto Fucatu Sugano [EMAIL PROTECTED] Universo Online S.A. -- http://www.uol.com.br --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null