Andrew, Yes i agree on "the PITA stuff is building a good reporting interface, good presentation, and meaningful graphs..", and of course dtquery runs also on my monitoring-site... We also run bug-zilla for mon-alerts; very nice; but if alerts being recovered automaticly (a lot will do...lucky us... > 90%) then there is no need to focus on this problem anymore... only whenever the same problem returns a lot, we use bug-zilla for normal bug-handeling... One pita is to tell bug-zilla how to use an UP-ALERT.....wich is no bug but a bug-fix.... (give the initial ALERT (=bug) the fixed/closed status)
I'm trying to get not only up- and downreports, but also other few performance data like CPU/DISK/NET/PROCESS(-load) (mostly to get from SNMP agents i think). (I could try -at first- within a SNMP-monitor-script to dump the data from each tested host also in a DB, after the normal MON testing part) With this, i can also check/report the server's resources after some time being operational. If some resources for example running out of limits within a sertain time, the need for a second 'web-server' could be there... We use the 70% policy here; that means, whenever a server reaches concurrently above 70% recource use, without any reason/errors, we normally add extra servers or extra resources (like RAM or DISK). Behind al this, of course i could just setup RRDTOOL / MRTG or other tools to graph such data BUT: with a monitoring site of 200+ servers with at least 10 performance counters (cpu/disk/net/etc) you will get out of the box 200 * 10 = 2000 webpages to check AND * 4 (day,week,month,year) = 8000 graphs.... Also the same data will be queried twice (MON and a extra collector daemon).... And it's not possible to make some other manual/user queries. The last part is to create some meaningfull report with up-down time; resource use and performance together bundled/compacted in 1 or 2 enduser (paper)pages... (ok, something everybody wants..... but after 6 pilots of commercial products we think this is as of today not available....; again we returned to open source and use 'do it yourself policy'.....) If al needed data is located in a DB, then it's possible to create this.. (with use of a MS Access (ODBC) frontend, web frontend or automated process which emails the results for ex.) Well, anybody an other point of view?? This is my goal to reach somehow... with or without extra daemons beside MON; Could also be called 'many to one' data-processing?? greetzz dick <> ----- Original Message ----- From: "Andrew Ryan" <[EMAIL PROTECTED]> To: "Dick de Waal" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Saturday, September 29, 2001 00:11 Subject: Re: New idea / database question.... > So, first off, let me point you to dtquery, in case you're not familiar with > it. It's a tool I wrote that does a lot of what you're asking for, except the > relational DB part (which really is not very hard, the PITA stuff is building > a good reporting interface, good presentation, and meaningful graphs). > http://www.nam-shub.com/files/dtquery/ > > There are a couple of fundamental problems here in creating such a system > with today's mon: > 1) The information which mon's monitors gives you varies widely in format, > both in the summary line and especially in the detail area. Effectively this > makes it impossible to get accurate information on a per-host basis through a > wide variety of alerts. This is planned to change in post 1.0 versions of mon > (1.1/1.2?) to allow for better reporting, especially on a per-host basis. I > guess this isn't a problem if you have 1:1 hostgroup/host mapping, but that > is not common for larger installations. > > 2) If you use mon's dtlog to get your downtime information, you miss certain > things (detail output from alerts, unchecked services, and services which go > down but don't come up before mon is reset or shut down). > > This also affects any attempt to build a meaningful database of downtime > based on alerts. I considered this but didn't have the time and energy to > work on something which, at best, would still be a hack. > > You can use something like the bugzilla_alert script to log alerts into a > bugzilla database, which will give you some reporting/querying capabilities. > That's probably the cheapest, easiest way to get mon info into a relational > DB. > > rrdmon probably comes closest to getting the most accurate data, because it > queries a running mon server for status every 30-60 sec, but does so with a > very high performance penalty (it consumes a lot of CPU) and doesn't have any > advanced querying features. It also doesn't let you get any more info out of > the alert and monitor scripts other than "hostgroup up" or "hostgroup down". > > I've heard of people that leverage their mon-generated rtt logs into a DB of > some sort and do stuff with them, but none of that has been released to the > public. > > Other than that, my plans are to basically wait until the next version of mon > allows us to do the job better from the ground up. But if anyone is going to > work on something in the meantime, feel free to use me as a resource. > > > andrew > On Friday 28 September 2001 07:26 am, Dick de Waal wrote: > > Thanks for the replies in the first place! > > Let me explain again; > > > > Of course i'm using MRTG / RRDMON / RRDTOOL / and other good graphers...! > > They can give a perfect graphical overview.. but there is MORE > > informational (performance) data > > used within mon which is only used for the monitoring part and after that > > (good or false) it's trown away.... > > > > Scott points it right, (i agree also) > > "> Mon is an excellent monitoring tool, as it was designed to be; that > > doesn't > > > > > necessarily make it an excellent tool for measuring performance, however. > > > I'd prefer that core development of mon continue in the direction of > > > monitoring state and sending notifications, and leave the task of > > > > graphing, > > > > > reports, etc. to other tools." > > > > Just to leave mon for monitoring, states and alerts; the need for a global > > data dbms storage > > is there......!! Once the data is in an dbms, seperate tools can handle > > this data without disturbing mon.... > > > > Which data i'm talking about??? Not the complete SNMP MIB-tree :-)) for > > example, but just the data which mon > > is already monitoring for a correct operation! For example: > > fping.monitor results; snmpvar.monitor results (cpu/process/disk/etc), > > http_xxx.monitor results, etc, etc... > > Just those data, need to be analysed in real time for correct operation > > (=mon) BUT is also important to get > > reports (trends / pro-active / average service-level reached / etc) after > > running x time. > > > > This prevents this data being retrieved twice, and why not?? It > > could(/should) be the last step in each monitoring > > script to store the retrieved monitoring-data into a database and nothing > > more....! > > > > This idea is just to prevent mon being used as a performance > > grapher/reporter and > > other _hacks_.... let different programs do the job...! > > > > anyone some perl --> postgresql commands/script-lines for me?? > > greettz > > dick > > <> > > > > ----- Original Message ----- > > From: "Scott Prater" <[EMAIL PROTECTED]> > > To: "Dick de Waal" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > > Sent: Friday, September 28, 2001 10:36 > > Subject: RE: New idea / database question.... > > > > > Right now, we're using a combination of mon with HP-OV to cover > > > monitorization. I'm looking into RRDTool for reporting. > > > > > > After spending a couple of years studying the topic, I've finally found > > > it useful to separate (mentally) the task of monitoring state from the > > > task > > > > of > > > > > measuring performance. Basically, what you're asking are two different > > > questions: > > > > > > * Is my system OK right now? (fundamentally a yes/no question, although > > > there are many degrees of "yes, but...") This is monitorization of > > > state. Tools such as Tivoli, HP-OV, mon, Big Brother, etc. focus on > > > answering > > > > this > > > > > question. > > > > > > * How is my system performing overall? (well, poorly, sometimes better > > > than other times, etc.) This is monitorization of performance over time, > > > usually shown in graphs. Tools such as MRTG and RRDTool focus on > > > > answering > > > > > this question. > > > > > > Of course, the lines blur, especially when you talk of what different > > > products provide -- the answer to the second question can determine the > > > answer to the first. Tools such as MRTG provide limited threshold > > > > checking > > > > > and notification, but they are no substitute for a full-featured > > > > monitoring > > > > > system, such as mon. On the other hand, tools such as mon can be adapted > > > > to > > > > > save state information for reporting purposes (with modules such as > > > > rrdmon), > > > > > but they're no substitute for reporting tools such as RRDTool. > > > > > > So far, I haven't found a reasonably-priced (or freeware) package that > > > > does > > > > > it all to my taste. Tivoli and HP-OV come close, but they still are > > > > focused > > > > > more on the first question, rather than on the second. There's the > > > > OpenNMS > > > > > project (http://www.opennms.org/), an open source freeware alternative to > > > tools such as HP-OV, but as far as I can tell, it's not really ready for > > > primetime yet. > > > > > > So, like most, I use a combination of tools to give me the big picture. > > > > As > > > > > another person pointed out, it's a pain in the neck to have to configure > > > several pieces of software to send multiple queries just to get data on > > > > one > > > > > element -- you usually end up cobbling together a series of management > > > scripts to tie it all together. But as the two tasks (checking state and > > > measuring performance) are fundamentally two different tasks, albeit very > > > closely related, I prefer to work with tools optimized to perform either > > > > one > > > > > or the other. > > > > > > Mon is an excellent monitoring tool, as it was designed to be; that > > > > doesn't > > > > > necessarily make it an excellent tool for measuring performance, however. > > > I'd prefer that core development of mon continue in the direction of > > > monitoring state and sending notifications, and leave the task of > > > > graphing, > > > > > reports, etc. to other tools. > > > > > > my two cents... > > > > > > Scott Prater > > > Dpto. Sistemas > > > [EMAIL PROTECTED] > > > > > > SERVICOM 2000 > > > Av. Primado Reig, 189 entlo. > > > 46020 Valencia - Spain > > > Tel. (+34) 96 332 12 00 > > > Fax. (+34) 96 332 12 01 > > > www.servicom2000.com > > > > > > > > > > > > > > > -----Mensaje original----- > > > De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]En > > > nombre de Dick de Waal > > > Enviado el: jueves, 27 de septiembre de 2001 22:59 > > > Para: [EMAIL PROTECTED] > > > Asunto: New idea / database question.... > > > > > > > > > Hello All! > > > Did anybody put the monitoring (performance) data from divers monitoring > > > scripts > > > into a database (like postgresql) for further analysing/reporting?? > > > Up- and downtimes are not enough for me, and want to store also the SNMP > > > (cpu, disk, process, etc), > > > HTTP respons, etc data > > > in this database to create service level reports... or even, because the > > > data is then realtime available, do some > > > pro-active SLA monitoring!!! > > > (retrieving is somehow already done for the monitoring part.... and after > > > comparing in the monitoring scripts, > > > this data is trown away...but is just usefull...!!) > > > > > > Has anybody some idea's/scripts or wanna have a > > > > beta/alpha/stable-tester???? > > > > > I'm now using the latest version and _of course_ running well again!! > > > Even for my test setup on a Sony Vaio laptop.... > > > > > > greeetzz > > > dick > > > <> > > > (not realy a perl programmer..........) >
