AW: dtlogging in combination with mon.cgi

2002-06-07 Thread Kreibaum, Uwe

We have hostgroups with 150 - 250 hosts... and since dtlog lacks the
details we need, we finally hacked a programm to analyze monhist.log
- we configured an empty alert (just alert.template), so that every
alert gets logged with the date and the failed hosts. It takes a bit
of data shuffling, but in the end you can at least get a host-based
list of downtimes which can easily be fed into some database for
further processing.

We tried to use rrdmon, but installing mon with > 2.000 hostgroups
is not that funny and monitoring it via mon.cgi is just impossible.
We still need another tool to monitor the lines in shorter intervals,
right now we're doing it in 10 min intervals which is sufficient
for internal use but we'd rather need 1-2 minutes.

Uwe


-Ursprüngliche Nachricht-
Von: Andrew Ryan [mailto:[EMAIL PROTECTED]]
Gesendet: Mittwoch, 5. Juni 2002 21:21
An: alan evetts; [EMAIL PROTECTED]
Betreff: Re: dtlogging in combination with mon.cgi


On Tuesday 04 June 2002 03:15 pm, alan evetts wrote:
> It seems that mon has built in dtlogging.. and mon.cgi likes to display
> this, too. which is really helpful _except_ for the part where it only
> seems to display group/service downtime - not group/host/service.

Since the concept of a 'host' doesn't officially exist in mon as of yet,
this 
is kind of impossible unless one does unnatural stuff with parsing the
dtlog. 
I say 'unnatural' because one is reliant upon the monitors to output the 
failed hostnames in the summary line (which some monitors don't do), and for

the dtlog to accurately track failing hosts, which it doesn't. 

For example, if hostgroup contains hosts A,B,C, let's say host A fails. The 
dtlog may record that A failed, assuming your monitor is well-behaved. If B 
fails 2 minutes later, while A is still down, this doesn't get recorded in 
the dtlog. If A comes back 2 minutes after B fails, and B is down for 2 more

days while A is up, the net result is that the dtlog thinks that B was never

down and A was down for 2 days.

>
> I'd love to be able to reflect back and see precisely which server in a
> hostgroup (I have up to 10 in each), and which service failed.

Me too, that's why I wrote dtquery. I suggest you give it a try. dtquery at 
least makes guesses and has a nice (IMHO) query interface to boot.

http://www.nam-shub.com/files/dtquery/

>
> Does anyone have a patch or a method for doing this?  

> I'd rather not use
> any additional software to store the stats.. as the way the downtime log
in
> mon.cgi is done is perfect for my use - just not detailed enough.  I don't
> want/need graphs, as if something failed once, a month ago for 5 minutes -
> it'll be hard to see.. but in the way mon.cgi shows the dtlog - it is
quite
> obvious.

I used the code from mon.cgi inside of dtquery to generate the same table 
that you see in mon.cgi.

You don't actually need to generate the graphs with dtquery. If you don't 
install the graphing programs, you get just the information it sounds like 
you're looking for.


andrew


Nachricht wurde von MailSweeper gescannt.



AW: logging to database and severity

2002-01-14 Thread Kreibaum, Uwe

Hello Andrew,

after looking into the patch from Dan Urist and a bit of playing
around with fping.monitor it seems to work! I missed the point that
mon can already handle alerts based on the exit status and the
severity patch is rather an add-on for the gui (but a rather useful
one...). My fping.monitor now reports the number of unreachable hosts
and based on these numbers, our alerts are triggered.

Thanks
  Uwe

--
Dr. Uwe Kreibaum
Lotterie-Treuhandgesellschaft mbH Hessen, Tel.: +49 611 3612-0
FAX: +49 611 3612-356



AW: AW: logging to database and severity

2002-01-14 Thread Kreibaum, Uwe

Hello Gilles,

I'm rather interested in testing your client... and I guess
most users on the list won't mind if it's not working out
of the box as long as one can get it going at all.

Uwe
--
Dr. Uwe Kreibaum
Lotterie-Treuhandgesellschaft mbH Hessen, Tel.: +49 611 3612-0
FAX: +49 611 3612-356


-Ursprüngliche Nachricht-
Von: Gilles Lamiral [mailto:[EMAIL PROTECTED]]
Gesendet: Sonntag, 13. Januar 2002 03:17
An: '[EMAIL PROTECTED]'
Betreff: Re: AW: logging to database and severity



> Our folks would like to
> run some kind of report like "show me the downtime for these 10 hosts
> within the last 14 days"

I wrote a client for mon called dbmon (used to be rrdmon)

features


- dbmon is a MON client so it can stay on a remote host.

- dbmon store status and response time, per host, for the protocols
dns, fping, ftp, https, http, imap, ldap, nntp, pop3, smtp
There is also a template to write a monitor to store anything
you need.

- dbmon store to a remote MySQL database (optional)
- dbmon store to local rrd database (MRTG successor)

It is GPL software but I don't have time to make a good public release.
It works on Linux and Solaris 2.5678. Written in Perl.
I can make a big tarball (easy) and someone can lighten it.


-- 
Au revoir,
Gilles Lamiral. France, L'Hermitage (35590).
33 (0) 2 99 78 62 49
33 (0) 6 20 79 76 06


Nachricht wurde von MailSweeper gescannt.



AW: logging to database and severity

2002-01-12 Thread Kreibaum, Uwe

Thanks for the replies - just to clarify: we have dtquery up & running,
but it's not exactly what we're looking for. Our folks would like to
run some kind of report like "show me the downtime for these 10 hosts
within the last 14 days" - which should be a classical case for a
database... but it's beyond my perl knowledge to use dtquery as an example
for a new program to store values from dt.log in a database. And I think
a better solution would be to stored the data directly via the alert.
I'll have a closer look at fping.monitor, looks like it's still simple 
enough that even I can understand it! :-)

I'll also have to take a closer look a mrtg again (which we've got 
installed as well). So far we've only used it for general traffic 
monitoring on our routers and as far as I remember there's some 
build-in data compression with the graphics, so with long-time logs, 
the data becomes more and more fuzzy... which is fine for traffic 
stats, but with a database, we could re-generate precise downtime 
information whenever needed.

Thanks again
  Uwe

-Ursprüngliche Nachricht-
Von: Andrew Ryan [mailto:[EMAIL PROTECTED]]
Gesendet: Freitag, 11. Januar 2002 19:58
An: Kreibaum, Uwe
Cc: [EMAIL PROTECTED]
Betreff: Re: logging to database and severity


At 12:43 PM 1/11/02 +0100, you wrote:
>  I had a look at dtquery but must
>admit
>that it is far beyond my current perl know-how to adapt it...

dtquery uses mon-generated downtime logs as its 'database'. It may not be 
what you want or need for your case however. The nice thing about it is 
that it gives you a query interface and doesn't require any extra work past 
enabling downtime logs and setting up dtquery initially.



>Furthermore, I'm looking for a way to classify the failures reported by
>fping:

what you describe here is possible by modifying the fping.monitor. You 
could easily add a flag that calculated the # of hosts that it was passed, 
and only gave an error if a certain number, or percentage of those hosts, 
were marked as unreachable. This would be pretty simple perl. Definitely 
leverage the 'exit=' flag for sending out different alerts.


andrew



Nachricht wurde von MailSweeper gescannt.



logging to database and severity

2002-01-11 Thread Kreibaum, Uwe

Hello,

we're running mon to monitor a net of (currently) 1.100 nodes. To gather
long-time
statistics on failures, we'd like to store the results from fping in a
database. Has
anyone done something like this before? I had a look at dtquery but must
admit
that it is far beyond my current perl know-how to adapt it... bugzilla-alert
would
be fine if there'd be an upalert to close the bugs (which is not possible
because
mon doesn't know the bug-id...)

Furthermore, I'm looking for a way to classify the failures reported by
fping: we've 
grouped the hosts by 50 per hostgroup (will be abt. 300 when everything is
installed). 
Generally, 3 to 5 hosts are not reachable (turned off or whatever) which is
our 
normal condition so it only needs to be recorded. In case of a major network
failure, 
the numbers go up to 70 percent or even the whole hostgroup is down. Is
there a 
way to generate alerts only in these cases? E.g. if more than 30 per cent
are down, 
generate alert 1, otherwise just log the downtime (or whatever is
appropriate).

I think (though I haven't tried it) this is a different functionality from
the severity
patch just posted to the list...?

Uwe