Re: MON_LAST_* Not actually the last output...
On Tue, 18 Mar 2003, Jim Trocki wrote: makes reading the alert a little confusing as the only difference between an alert and an upalert is one word in the subject header. the rationale is to show you the specifics of what was not working in the past but now is working. if the last summary / output in the upalert was from the successsful test you would not see the detail of what hostgroup members were having the problem. That information is available/included when the alert is called. Depending on the settings alerts are also sent whenever the output of the monitor changes. There is no new or interesting information sent for the upalert. Ever since my first upalert I've felt that something is not quite right about the information presented. The other people in my department have the same feeling. The message is just confusing - Is it up? But it says that hosts are still unreachable! If this was a democracy (I know that it is not ;-) then I would vote to have David's patch included in the next release... By the way David, it looks like your patch changes the content of the standard input given to the alert, but does it actually modify the parameters for the MON_LAST.. variables? Regards, Mark. -- Mark Lawrence [EMAIL PROTECTED] ___ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon
Re: MON_LAST_* Not actually the last output...
On Wed, 19 Mar 2003, Mark Lawrence wrote: If this was a democracy (I know that it is not ;-) then I would vote to have David's patch included in the next release... Or maybe it would be better to have both the last failure output and the last success output in different environment variables so that the alert can do what it likes. cheers, Mark. -- Mark Lawrence [EMAIL PROTECTED] ___ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon
Re: MON_LAST_* Not actually the last output...
--On Wednesday, March 19, 2003 9:08 AM +0100 Mark Lawrence [EMAIL PROTECTED] wrote: By the way David, it looks like your patch changes the content of the standard input given to the alert, but does it actually modify the parameters for the MON_LAST.. variables? Ahh, thats what I get for posting a patch without looking completely at the results. You're right, that patch changes the standard input passed to the alert (which is what most alerts I've seen actually output as their detail message.) MON_LAST_OUTPUT remains the *previous* output, which seems logical, given the variable name. This provides the ability for the alert to include both, if it so desires. Ever since my first upalert I've felt that something is not quite right about the information presented. The other people in my department have the same feeling. The message is just confusing - Is it up? But it says that hosts are still unreachable! I agree, and my users had the same response. An OK message that in the body says the server is still down tends to confuse people. In fact I'd say they threatened to lynch our mon team if we didn't fix it. I think the best answer is to have per host status tracking. This topic came up on the mailing list a while back, and I said I'd write the code. Sadly that rewrite hasn't started yet, and isn't going to start for another couple of months. But the goal will be results similar to the following: Host A goes down: Old: ALERT for Host A New: ALERT for Host A Host B goes down: Old: ALERT for Host A and B New: New Alert Type STATUSCHANGEALERT for the hostgroup that says Host B now down, Host A still down Host B comes back up: Old: ALERT for Host A New: STATUSCHANGEALERT - Host B OK, Host A still down Host A gets disabled by a user: Old: on next monitor invocation, UPALERT New: immediate STATUSCHANGEALERT (or perhaps some other new type): - Host A now disabled, by user X. Remaining Hosts all OK. This also will allow for several other useful changes, like this: Host A goes down: same as above Failure gets ack'ed by user: ACK-ALERT, including the text of the ACK Host B fails: Old: No Alert (bad! I've changed this in my mon environment) New: STATUSCHANGEALERT: Host B down, Host A still down, but acked. Host B comes back up: Old: Still no alert (we're still acked) New: STATUSCHANGEALERT: Host B OK, Host A still down, but acked. Another useful feature will be the ability to avoid the following: A goes down, ALERT for inability to ping A, A comes back, UPALERT for A, but the host is still booting, so the web server isn't up yet, so we alert for http. The fix for this will be per host dependency memory. The dependency of group:http on group:ping will be per-host, and have a configurable look-back feature. So you could configure group:http to not alert if the individual host has failed the ping test within the last 5 minutes, for example. I already have something very much like this in place in my current Mon version, but it's really a hack. (Comparing this host to the summary output of the last failure of the dependency.) It works 90% of the time, but isn't the true correct solution. Per-Host Status tracking is. Assuming management doesn't direct our efforts elsewhere, I expect the per-host status work will start sometime this summer. And given the strong desire in my user community for better per host tracking, including for logging purposes, I don't expect management to do that. -David Nolan Network Software Developer Computing Services Carnegie Mellon University ___ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon
Re: Monitoring NT Services
I've found monitoring NT over SNMP to be extremely unreliable, but it's possible. This is what I did for NT4.0, it may be different for win2k: Check out this 'http://www.techie.net.nz/networking/mrtg/' site (which has changed drastically since I last used it). Basically, you need to install the perf SNMP extension on each monitored host, either get find it online on the NT resource kit cd. If it is already included in win2k this will be much easier. Then find a copy of mbrowse for linux, I think it is on sourceforge.net. Install the windows perf extension MIB on your linux box and fire up mbrowse. Browse for the values you are interested in. There's probably a canned SNMP monitor for mon, but I am testing with a shell script that uses the ucd-snmp's 'snmpget' and 'snmpwalk'. All of these can be obtained free online, though the perf SNMP extension is Microsoft property and may have a license. Brian Fender Winters, Jason [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 03/18/03 01:11 PM To:'[EMAIL PROTECTED]' [EMAIL PROTECTED] cc: Subject:Monitoring NT Services I'm trying to setup Mon to monitor NT services running on Windows 2000 boxes. Unfortunately, I have little experience with Mon or with network management/monitoring. Has anyone found a way to monitor NT services reliably? I've considered using SNMP; however, I haven't found any way to use SNMP to check the status of a running service. I've considered using SNMP traps; however, this doesn't seem terribly reliable since the UDP packet could never make it to the server running Mon. Using traps tells me something bad happened but it doesn't give me the sense of security that actually checking the services periodically would give me. Has anyone else in the Mon community found a way to solve this problem? If possible I'd like to implement something that doesn't require installing software (particularly commercial software) on all of my NT boxes. Any suggestions, experiences, best practices, things to avoid, etc. would be greatly appreciated. Thanks, Jason Winters
Re: New release of mon?
On Wed, Mar 19, 2003 at 02:49:43PM +0100, Hans Kinwel wrote: Why not put mon on sourceforge? It's already there - see sourceforge.net/projects/mon . Alas, the latest version there is 0.38.20. And Jim is both the admin and the only developer registered. Jim, have you thought about reactivating Soureforge and delegating responsibility for various parts of Mon? At the very least, you could let others maintain the monitors and alerts, where a lot of the useful patches have taken place. -- Ed ___ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon
volunteer to maintain
I will volunteen to be a maintainer and get patches applied. I've done a little writing of alerts. Consider this my payback for using mon. -- Ted Serreyn Serreyn Network Services, LLC http://www.serreyn.com/ ___ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon
Re: MON_LAST_* Not actually the last output...
Hello, the rationale is to show you the specifics of what was not working in the past but now is working. if the last summary / output in the upalert was from the successsful test you would not see the detail of what hostgroup members were having the problem. You would use the history list. This behavior is confusing unless your a MON admin. I remember I had to suppress it from minotaur.cgi because users think something is wrong but nothing is. They are right to think that since we all think that until we've read the mon manpage five times. -- Au revoir, 33 (0) 2 99 78 62 49 Gilles Lamiral. France, L'Hermitage (35590) 33 (0) 6 20 79 76 06 http://www.sri.ucl.ac.be/SRI/frfc/rfc1855.fr.html ___ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon