AW: dtlogging in combination with mon.cgi
We have hostgroups with 150 - 250 hosts... and since dtlog lacks the details we need, we finally hacked a programm to analyze monhist.log - we configured an empty alert (just alert.template), so that every alert gets logged with the date and the failed hosts. It takes a bit of data shuffling, but in the end you can at least get a host-based list of downtimes which can easily be fed into some database for further processing. We tried to use rrdmon, but installing mon with > 2.000 hostgroups is not that funny and monitoring it via mon.cgi is just impossible. We still need another tool to monitor the lines in shorter intervals, right now we're doing it in 10 min intervals which is sufficient for internal use but we'd rather need 1-2 minutes. Uwe -Ursprüngliche Nachricht- Von: Andrew Ryan [mailto:[EMAIL PROTECTED]] Gesendet: Mittwoch, 5. Juni 2002 21:21 An: alan evetts; [EMAIL PROTECTED] Betreff: Re: dtlogging in combination with mon.cgi On Tuesday 04 June 2002 03:15 pm, alan evetts wrote: > It seems that mon has built in dtlogging.. and mon.cgi likes to display > this, too. which is really helpful _except_ for the part where it only > seems to display group/service downtime - not group/host/service. Since the concept of a 'host' doesn't officially exist in mon as of yet, this is kind of impossible unless one does unnatural stuff with parsing the dtlog. I say 'unnatural' because one is reliant upon the monitors to output the failed hostnames in the summary line (which some monitors don't do), and for the dtlog to accurately track failing hosts, which it doesn't. For example, if hostgroup contains hosts A,B,C, let's say host A fails. The dtlog may record that A failed, assuming your monitor is well-behaved. If B fails 2 minutes later, while A is still down, this doesn't get recorded in the dtlog. If A comes back 2 minutes after B fails, and B is down for 2 more days while A is up, the net result is that the dtlog thinks that B was never down and A was down for 2 days. > > I'd love to be able to reflect back and see precisely which server in a > hostgroup (I have up to 10 in each), and which service failed. Me too, that's why I wrote dtquery. I suggest you give it a try. dtquery at least makes guesses and has a nice (IMHO) query interface to boot. http://www.nam-shub.com/files/dtquery/ > > Does anyone have a patch or a method for doing this? > I'd rather not use > any additional software to store the stats.. as the way the downtime log in > mon.cgi is done is perfect for my use - just not detailed enough. I don't > want/need graphs, as if something failed once, a month ago for 5 minutes - > it'll be hard to see.. but in the way mon.cgi shows the dtlog - it is quite > obvious. I used the code from mon.cgi inside of dtquery to generate the same table that you see in mon.cgi. You don't actually need to generate the graphs with dtquery. If you don't install the graphing programs, you get just the information it sounds like you're looking for. andrew Nachricht wurde von MailSweeper gescannt.
AW: logging to database and severity
Hello Andrew, after looking into the patch from Dan Urist and a bit of playing around with fping.monitor it seems to work! I missed the point that mon can already handle alerts based on the exit status and the severity patch is rather an add-on for the gui (but a rather useful one...). My fping.monitor now reports the number of unreachable hosts and based on these numbers, our alerts are triggered. Thanks Uwe -- Dr. Uwe Kreibaum Lotterie-Treuhandgesellschaft mbH Hessen, Tel.: +49 611 3612-0 FAX: +49 611 3612-356
AW: AW: logging to database and severity
Hello Gilles, I'm rather interested in testing your client... and I guess most users on the list won't mind if it's not working out of the box as long as one can get it going at all. Uwe -- Dr. Uwe Kreibaum Lotterie-Treuhandgesellschaft mbH Hessen, Tel.: +49 611 3612-0 FAX: +49 611 3612-356 -Ursprüngliche Nachricht- Von: Gilles Lamiral [mailto:[EMAIL PROTECTED]] Gesendet: Sonntag, 13. Januar 2002 03:17 An: '[EMAIL PROTECTED]' Betreff: Re: AW: logging to database and severity > Our folks would like to > run some kind of report like "show me the downtime for these 10 hosts > within the last 14 days" I wrote a client for mon called dbmon (used to be rrdmon) features - dbmon is a MON client so it can stay on a remote host. - dbmon store status and response time, per host, for the protocols dns, fping, ftp, https, http, imap, ldap, nntp, pop3, smtp There is also a template to write a monitor to store anything you need. - dbmon store to a remote MySQL database (optional) - dbmon store to local rrd database (MRTG successor) It is GPL software but I don't have time to make a good public release. It works on Linux and Solaris 2.5678. Written in Perl. I can make a big tarball (easy) and someone can lighten it. -- Au revoir, Gilles Lamiral. France, L'Hermitage (35590). 33 (0) 2 99 78 62 49 33 (0) 6 20 79 76 06 Nachricht wurde von MailSweeper gescannt.
AW: logging to database and severity
Thanks for the replies - just to clarify: we have dtquery up & running, but it's not exactly what we're looking for. Our folks would like to run some kind of report like "show me the downtime for these 10 hosts within the last 14 days" - which should be a classical case for a database... but it's beyond my perl knowledge to use dtquery as an example for a new program to store values from dt.log in a database. And I think a better solution would be to stored the data directly via the alert. I'll have a closer look at fping.monitor, looks like it's still simple enough that even I can understand it! :-) I'll also have to take a closer look a mrtg again (which we've got installed as well). So far we've only used it for general traffic monitoring on our routers and as far as I remember there's some build-in data compression with the graphics, so with long-time logs, the data becomes more and more fuzzy... which is fine for traffic stats, but with a database, we could re-generate precise downtime information whenever needed. Thanks again Uwe -Ursprüngliche Nachricht- Von: Andrew Ryan [mailto:[EMAIL PROTECTED]] Gesendet: Freitag, 11. Januar 2002 19:58 An: Kreibaum, Uwe Cc: [EMAIL PROTECTED] Betreff: Re: logging to database and severity At 12:43 PM 1/11/02 +0100, you wrote: > I had a look at dtquery but must >admit >that it is far beyond my current perl know-how to adapt it... dtquery uses mon-generated downtime logs as its 'database'. It may not be what you want or need for your case however. The nice thing about it is that it gives you a query interface and doesn't require any extra work past enabling downtime logs and setting up dtquery initially. >Furthermore, I'm looking for a way to classify the failures reported by >fping: what you describe here is possible by modifying the fping.monitor. You could easily add a flag that calculated the # of hosts that it was passed, and only gave an error if a certain number, or percentage of those hosts, were marked as unreachable. This would be pretty simple perl. Definitely leverage the 'exit=' flag for sending out different alerts. andrew Nachricht wurde von MailSweeper gescannt.
logging to database and severity
Hello, we're running mon to monitor a net of (currently) 1.100 nodes. To gather long-time statistics on failures, we'd like to store the results from fping in a database. Has anyone done something like this before? I had a look at dtquery but must admit that it is far beyond my current perl know-how to adapt it... bugzilla-alert would be fine if there'd be an upalert to close the bugs (which is not possible because mon doesn't know the bug-id...) Furthermore, I'm looking for a way to classify the failures reported by fping: we've grouped the hosts by 50 per hostgroup (will be abt. 300 when everything is installed). Generally, 3 to 5 hosts are not reachable (turned off or whatever) which is our normal condition so it only needs to be recorded. In case of a major network failure, the numbers go up to 70 percent or even the whole hostgroup is down. Is there a way to generate alerts only in these cases? E.g. if more than 30 per cent are down, generate alert 1, otherwise just log the downtime (or whatever is appropriate). I think (though I haven't tried it) this is a different functionality from the severity patch just posted to the list...? Uwe