Tks to everyone. Let me explain the situation. The service in question is a software developed by my own company. This service "consumes" files in a defined directory, generated by other program. This is the metric i use to monitor.
Like any software in constant development, it will eventualy crash or freeze. Doing so, the files on the directory end up accumulating. If the number of files cross the threshold, the warn or crit flag is set up. We DO check why the service stoped, but the service must be up and running as fast as possible, so this is why we restart the service. Later we can check what is going wrong. I also made, some months ago, a simple bash script that monitors the # of files, restart the service if necessary and logs this kind of event. What i do not know if this is the best aproach. Nagios gives me the visual tools to se in real time in a big panel if everything is OK with my servers. So i though if it can take proactives actions and if this aproach would be better than my simple scripts. dave stern - e-mail.pluribus.unum escreveu: > Ok, everyone agrees event handler can take action to fix a problem but bear in > mind that this comes with caveats. Affectively, nagios event handler is > treating > a symptom; the disease goes merely on its way. If a service stops, WHY did > it stop in the first place? Most good sysadmins would tackle the problem from > the system end to insure that the service would never fail again. Furthermore, > let's say a service failed for a reason, eg out of disk space. What > good what it > do to restart the service again? And if you build smarts into the > event handler to > look for and fix such a condition, is that the ONLY condition that could occur > to stop this service? > > Having said all this, event handlers do have their place. We in fact use them > to shut down hosts if the temperature gets too hot. You can imagine the > testing we went through before rolling out something like this. > > > > On Thu, Sep 3, 2009 at 7:44 AM, Leonardo > Carneiro<lscarne...@veltrac.com.br> wrote: > >> hello everyone. >> >> Started to play with Nagios a few days ago and i'm very excited with it. >> I have a very small setup (2 linux server being monitored via npre by a >> third linux server) and i'd wrote some bash scripts to monitor some of >> the services that we run on those services (proprietary services, >> non-standard ones like ssh, apache and that stuff). >> >> I know Nagios can send sms, email and other things to warn >> administrators about problems, but can Nagios take any action to fix the >> problem, like restart the service if reach critical state, or restart >> the service if the service stays critical for more than 5 minutes? >> >> If yes, can someone just point me to the direction i should go? :) >> >> Tks in advance, and sorry about my poor english. I'm from Brazil. >> -- >> >> *Leonardo de Souza Carneiro* >> *Veltrac - Tecnologia em Logística.* >> lscarne...@veltrac.com.br <mailto:lscarne...@veltrac.com.br> >> http://www.veltrac.com.br <http://www.veltrac.com.br/> >> /Fone Com.: (43)2105-5601/ >> /Av. Higienópolis 1601 Ed. Eurocenter Sl. 803/ >> /Londrina- PR/ >> /Cep: 86015-010/ >> >> >> >> >> ------------------------------------------------------------------------------ >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >> trial. Simplify your report design, integration and deployment - and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Nagios-users mailing list >> Nagios-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nagios-users >> ::: Please include Nagios version, plugin version (-v) and OS when reporting >> any issue. >> ::: Messages without supporting info will risk being sent to /dev/null >> >> > > -- *Leonardo de Souza Carneiro* *Veltrac - Tecnologia em Logística.* lscarne...@veltrac.com.br <mailto:lscarne...@veltrac.com.br> http://www.veltrac.com.br <http://www.veltrac.com.br/> /Fone Com.: (43)2105-5601/ /Av. Higienópolis 1601 Ed. Eurocenter Sl. 803/ /Londrina- PR/ /Cep: 86015-010/ ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null