Re: MON_LAST_* Not actually the last output...

2003-03-19 Thread Mark Lawrence
On Tue, 18 Mar 2003, Jim Trocki wrote:

  makes reading the alert a little confusing as the only difference between
  an alert and an upalert is one word in the subject header.

 the rationale is to show you the specifics of what was not working in the
 past but now is working. if the last summary / output in the upalert was
 from the successsful test you would not see the detail of what hostgroup
 members were having the problem.

That information is available/included when the alert is called. Depending
on the settings alerts are also sent whenever the output of the monitor
changes. There is no new or interesting information sent for the upalert.

Ever since my first upalert I've felt that something is not quite right
about the information presented. The other people in my department have
the same feeling. The message is just confusing - Is it up? But it says
that hosts are still unreachable!

If this was a democracy (I know that it is not ;-) then I would vote to
have David's patch included in the next release...

By the way David, it looks like your patch changes the content of the
standard input given to the alert, but does it actually modify the
parameters for the MON_LAST..  variables?

Regards, Mark.
--
Mark Lawrence
[EMAIL PROTECTED]


___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: MON_LAST_* Not actually the last output...

2003-03-19 Thread Mark Lawrence
On Wed, 19 Mar 2003, Mark Lawrence wrote:

 If this was a democracy (I know that it is not ;-) then I would vote to
 have David's patch included in the next release...

Or maybe it would be better to have both the last failure output and the
last success output in different environment variables so that the alert
can do what it likes.

cheers, Mark.
--
Mark Lawrence
[EMAIL PROTECTED]


___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: MON_LAST_* Not actually the last output...

2003-03-19 Thread David Nolan


--On Wednesday, March 19, 2003 9:08 AM +0100 Mark Lawrence 
[EMAIL PROTECTED] wrote:

By the way David, it looks like your patch changes the content of the
standard input given to the alert, but does it actually modify the
parameters for the MON_LAST..  variables?


Ahh, thats what I get for posting a patch without looking completely at the 
results.

You're right, that patch changes the standard input passed to the alert 
(which is what most alerts I've seen actually output as their detail 
message.)  MON_LAST_OUTPUT remains the *previous* output, which seems 
logical, given the variable name.  This provides the ability for the alert 
to include both, if it so desires.


Ever since my first upalert I've felt that something is not quite right
about the information presented. The other people in my department have
the same feeling. The message is just confusing - Is it up? But it says
that hosts are still unreachable!
I agree, and my users had the same response.  An OK message that in the 
body says the server is still down tends to confuse people.  In fact I'd 
say they threatened to lynch our mon team if we didn't fix it.

I think the best answer is to have per host status tracking.  This topic 
came up on the mailing list a while back, and I said I'd write the code. 
Sadly that rewrite hasn't started yet, and isn't going to start for another 
couple of months.  But the goal will be results similar to the following:

Host A goes down:
Old: ALERT for Host A
New: ALERT for Host A
Host B goes down:   
Old: ALERT for Host A and B
New: New Alert Type STATUSCHANGEALERT for the hostgroup that says
 Host B now down, Host A still down
Host B comes back up:
Old: ALERT for Host A
New: STATUSCHANGEALERT - Host B OK, Host A still down
Host A gets disabled by a user:
Old: on next monitor invocation, UPALERT
New: immediate STATUSCHANGEALERT (or perhaps some other new type):
  - Host A now disabled, by user X.  Remaining Hosts all OK.
This also will allow for several other useful changes, like this:

Host A goes down:  same as above

Failure gets ack'ed by user:  ACK-ALERT, including the text of the ACK

Host B fails:
Old: No Alert (bad!  I've changed this in my mon environment)
New: STATUSCHANGEALERT:  Host B down, Host A still down, but acked.
Host B comes back up:
Old: Still no alert (we're still acked)
New: STATUSCHANGEALERT: Host B OK, Host A still down, but acked.
Another useful feature will be the ability to avoid the following:
A goes down, ALERT for inability to ping A, A comes back, UPALERT for A, 
but the host is still booting, so the web server isn't up yet, so we alert 
for http.

The fix for this will be per host dependency memory.  The dependency of 
group:http on group:ping will be per-host, and have a configurable 
look-back feature.  So you could configure group:http to not alert if the 
individual host has failed the ping test within the last 5 minutes, for 
example.

I already have something very much like this in place in my current Mon 
version, but it's really a hack.  (Comparing this host to the summary 
output of the last failure of the dependency.)  It works 90% of the time, 
but isn't the true correct solution.  Per-Host Status tracking is.

Assuming management doesn't direct our efforts elsewhere, I expect the 
per-host status work will start sometime this summer.  And given the strong 
desire in my user community for better per host tracking, including for 
logging purposes, I don't expect management to do that.



-David Nolan
Network Software Developer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring NT Services

2003-03-19 Thread brian_fender

I've found monitoring NT over SNMP to be extremely unreliable, but it's possible. This is what I did for NT4.0, it may be different for win2k:

Check out this 'http://www.techie.net.nz/networking/mrtg/' site (which has changed drastically since I last used it). Basically, you need to install the perf SNMP extension on each monitored host, either get find it online on the NT resource kit cd. If it is already included in win2k this will be much easier. Then find a copy of mbrowse for linux, I think it is on sourceforge.net. Install the windows perf extension MIB on your linux box and fire up mbrowse. Browse for the values you are interested in. There's probably a canned SNMP monitor for mon, but I am testing with a shell script that uses the ucd-snmp's 'snmpget' and 'snmpwalk'.

All of these can be obtained free online, though the perf SNMP extension is Microsoft property and may have a license.


Brian Fender








Winters, Jason [EMAIL PROTECTED]
Sent by: [EMAIL PROTECTED]
03/18/03 01:11 PM


To:'[EMAIL PROTECTED]' [EMAIL PROTECTED]
cc:
Subject:Monitoring NT Services


I'm trying to setup Mon to monitor NT services running on Windows 2000 boxes. Unfortunately, I have little experience with Mon or with network management/monitoring.  Has anyone found a way to monitor NT services reliably? I've considered using SNMP; however, I haven't found any way to use SNMP to check the status of a running service. I've considered using SNMP traps; however, this doesn't seem terribly reliable since the UDP packet could never make it to the server running Mon. Using traps tells me something bad happened but it doesn't give me the sense of security that actually checking the services periodically would give me.

Has anyone else in the Mon community found a way to solve this problem? If possible I'd like to implement something that doesn't require installing software (particularly commercial software) on all of my NT boxes. Any suggestions, experiences, best practices, things to avoid, etc. would be greatly appreciated.

Thanks,
Jason Winters



Re: New release of mon?

2003-03-19 Thread Ed Ravin
On Wed, Mar 19, 2003 at 02:49:43PM +0100, Hans Kinwel wrote:
 Why not put mon on sourceforge?

It's already there - see sourceforge.net/projects/mon .  Alas,
the latest version there is 0.38.20.  And Jim is both the admin
and the only developer registered.

Jim, have you thought about reactivating Soureforge and delegating
responsibility for various parts of Mon?  At the very least, you could
let others maintain the monitors and alerts, where a lot of the useful
patches have taken place.

-- Ed
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


volunteer to maintain

2003-03-19 Thread Ted Serreyn
I will volunteen to be a maintainer and get patches applied.  I've done
a little writing of alerts.  Consider this my payback for using mon.


-- 
Ted Serreyn
Serreyn Network Services, LLC
http://www.serreyn.com/

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: MON_LAST_* Not actually the last output...

2003-03-19 Thread Gilles LAMIRAL
Hello,

 the rationale is to show you the specifics of what was not working in the
 past but now is working. if the last summary / output in the upalert was
 from the successsful test you would not see the detail of what hostgroup
 members were having the problem.

You would use the history list. This behavior is confusing unless your
a MON admin. I remember I had to suppress it from minotaur.cgi because
users think something is wrong but nothing is. They are right to think
that since we all think that until we've read the mon manpage five times.

-- 
Au revoir,  33 (0) 2 99 78 62 49
Gilles Lamiral. France, L'Hermitage (35590) 33 (0) 6 20 79 76 06
http://www.sri.ucl.ac.be/SRI/frfc/rfc1855.fr.html
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon