Re: Feature Request HostGroup in environment.

2010-01-06 Thread David Nolan
I think thats a great idea.

Which is probably why it already exists... :)

try MON_GROUP and MON_SERVICE

(I see that the documentation doesn't list those for monitors, only
for alerts, but they do exist and work.)

-David



On Wed, Jan 6, 2010 at 1:30 PM, Nathan Gibbs nat...@cmpublishers.com wrote:
 What.

 Export the HostGroup of the about to be run monitor into its environment.
 Possibly something like
 MON_HOST_GROUP

 Why?
 Summary
 To give a monitor a way to identify itself form another instance of
 itself running in a different HostGroup.

 Detail
 For years I've had a situation where a server reboot or an snmpd service
 restart would occasionally put the reboot.monitor into an error state
 for a random amount of time longer than necessary.  Anywhere from 5
 minutes to hours.
 Sometimes the problem would fix itself, other time I would have to rm
 the state file.

 What was happening was that the reboot.monitor in HG1 where the reboot
 happened would write the state file just after the reboot.monitor in HG2
 would read it. Obviously the monitor in HG2 would write out incorrect
 data for the hosts in HG1.

 Yes, in this particular instance I could use the --statefile= option 
 be done with it.  However I'm thinking beyond this particular monitor.
 If this feature was added

 1. any monitor that needed a unique statefile name could trivially get one.
 $STATEFILE = $ENV{MON_HOST_GROUP} . $ME.state;
 This or something like it could be added to the monitor template.

 2. All statefiles would follow a convention of HostGroup.Monitor.state.
 3. It would be easy to know what file was built by which monitor instance.
 4. No need to implement an option to set a statefile name.
 5. Simpler config as the above options are no longer needed.

 What do you think?
 :-)

 --
 Sincerely,

 Nathan Gibbs

 Systems Administrator
 Christ Media
 http://www.cmpublishers.com



 ___
 mon mailing list
 mon@linux.kernel.org
 http://linux.kernel.org/mailman/listinfo/mon



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: syntax error using exclude_period

2009-12-07 Thread David Nolan
On Mon, Dec 7, 2009 at 4:51 PM, Alex Dean a...@crackpot.org wrote:

 This is using the mon package provided by Ubuntu Karmic (9.10).
 # dpkg --list | grep mon
 ... skipping a bunch of mono stuff ...
 ii  mon                                  0.99.2-13ubuntu1
         monitor hosts/services/whatever and alert ab



I'm pretty sure thats your problem right there.  I think this was a
bug in that version of Mon.  (And that version is 6+ years old at
least)

Please upgrade and try again.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Polling interval not updated when montraps are received

2009-11-20 Thread David Nolan
On Fri, Nov 20, 2009 at 12:28 PM, Anders Synstad ander...@basefarm.no wrote:

 On the server side however, the check works as a heartbeat.
 Checking if the localservice is still alive. But this is
 only performed once every hour.

My suggestion would be to use the 'redistribute' feature that was
added a while back on the agent, causing it to pass every status
update to the master, so you can see that the check was run recently
and the result was OK.

Then you can also set the traptimeout setting to ensure that you are
receiving traps at regular intervals, and alert if the agent stops
sending traps.

I did exactly this with Mon with a master/slave Mon setup.  (Its why I
implemented the redistribute feature)

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: dns resolver monitoring?

2009-11-03 Thread David Nolan
There already is support in dns.monitor for recursive server testing.
(I wrote the code)

try dns.monitor -caching_only -query www.yahoo.com:A -query
google.com:MX servername

-David

On Tue, Nov 3, 2009 at 4:11 PM, Nathan Gibbs nat...@cmpublishers.com wrote:
 * Kastus Shchuka wrote:
 On Tue, Nov 03, 2009 at 12:24:33PM -0500, Nathan Gibbs wrote:
 Isn't a resolver part of the OS libraries that do DNS lookups, not a
 network service that can be checked.

 Mike probably used resolver meaning recursive/caching server

 Yeah, your right there.

 There is no sense in monitoring resolver libraries.

 My point exactly.  At least, that was what I was trying to say.
 :-)

 Yo may want to
 look at http://cr.yp.to/djbdns/separation.html for explanation.

 dns.monitor -caching_only record:TXT:result

 should be able to do it, but doesn't appear to work like the
 instructions say.

 There are too many aspects involved in recursive name resolution and there is
 no easy way (or sense) to monitor all of them.


 Right.

 dns.monitor is only proving that all authoritative DNS servers serve the
 same zone information. They do not check if published zone is correct, 
 though.

 One possible way to monitor recursive/caching server would be to
 resolve a name coming from a known good authritative server.
 It's fairly easy to script and convert into a monitor.

 Yeah,
 A few mod's to dns.monitor would make that work.
 I don't plan on doing it this year, maybe next.


 --
 Sincerely,

 Nathan Gibbs

 Systems Administrator
 Christ Media
 http://www.cmpublishers.com



 ___
 mon mailing list
 mon@linux.kernel.org
 http://linux.kernel.org/mailman/listinfo/mon



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: multi depend gives error

2008-08-19 Thread David Nolan
Udo,

Mon depend expressions are perl expressions.  You probably want:
  depend webservers1:ldap  gateway:ping


-David

On Tue, Aug 19, 2008 at 7:23 AM, Udo Rader [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I almost don't dare to ask ...

 Using the insanely old mon provided by debian (0.99.2-12) I get errors
 in my syslog when I use more than one dependency on a depend line, eg:

 - ---CUT---
 watch foo
  service bar
depend webservers1:ldap gateway:ping
[...]
 - ---CUT---

 Syslog then shows this:

 - ---CUT---
 eval error for dependency starting at webservers1:ldap gateway:ping
 - ---CUT---

 So if anybody has an idea how to deal with that, I would be very
 grateful (even if only updating solves it :-)

 Thanks!

 - --
 Udo Rader
 http://www.bestsolution.at
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)
 Comment: Using GnuPG with Mandriva - http://enigmail.mozdev.org

 iEYEARECAAYFAkiqrSkACgkQJkMMup66A9ya1ACgrATYZNG1iFJYaY6ot+AAnlpq
 5bkAoN5rqnPOhCU3Fb0YBzmVaBjiEQIj
 =fiXg
 -END PGP SIGNATURE-

 ___
 mon mailing list
 mon@linux.kernel.org
 http://linux.kernel.org/mailman/listinfo/mon



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon.cgi very slow, communication protocol improvements?

2008-04-10 Thread David Nolan
Rune,

You didn't mention some important bits of imformation, most
significantly what version of Mon, Mon::Client and mon.cgi you are
using.  There have been significant protocol changes in various
versions.  Speed problems that occurred with 0.99.2 are pretty much
gone with 1.2.0 for example.  Also what OS are you running on?

I'm using mon with well over 100 hostgroups without any performance
problems, with mon.cgi rendering a full page in under a second
typically.  I can't see how the performance would fail to scale to
600.

Off the top of my head I'm guessing that Storable would actually
increase the overhead in the mon server  cgi, as the data still has
to be transformed into the sharable form and then re-parsed.

-David


On Thu, Apr 3, 2008 at 4:09 AM, Rune Kristian Viken [EMAIL PROTECTED] wrote:

  I'm using mon to monitor  600 hostgroups, with an average of 8 or so
  services each.  The total number of hosts is  1000.

  The main problem I've come accross is that mon.cgi is slow, and after some
  debugging, it seems that it's the communication with the mon-server that is
  slow.  I have to wait an average of about 12 seconds per pageview.

  I've tried digging around a bit, and it seems that it's two routines in
  query_opstatus that takes quite a long time:

 %op_success = mon_list_successes;
 %op_failure = mon_list_failures ;

  Looking at the communication protocol, it seems that the main drawback is
  that mon has to spin through a *lot* of data-structures and present them in
  a nice way.

  I was thinking that this might accomplished faster by sharing the %watch and
  maybe %groups data-structure from mon, with the help of
  http://perldoc.perl.org/Storable.html .. but even though I feel I have
  decent know-how of mon-internals, I don't feel they're entirely up to
  scratch on how to implement this.

  Is it a good idea?  A horrible idea?  Am I barking up the wrong tree, with
  something else being the main problem here?

  --
  Rune Kristian Viken

  ___
  mon mailing list
  mon@linux.kernel.org
  http://linux.kernel.org/mailman/listinfo/mon



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: avoid duplicated alerts in a multi-host/mon context

2007-10-17 Thread David Nolan
On 10/17/07, Jacques Klein [EMAIL PROTECTED] wrote:
 Well, not really, or not enough in fact.
 If I understand the depend, it's a way to avoid multiple alerts by
 specifying dependencies between services in ONE mon.
 If I take this concept, then it would have to be extended to
 dependencies between services in a GROUP of mon(s) (one per host),
 interesting but seems very complicated.


If you configure each of your mon servers to send traps to all of the
others on status updates, then you can use dependencies on each server
based on state changes from other servers.

If they're all one one LAN you could probably even do that by sending
the status updates as broadcast packets.   I've never tried that, it
might take minor coding in Mon to make it process broadcast packets.
Of course even better would be multicast, but that would definitely
require some code changes.

The best way to cause all status updates to get propagated is by using
the 'redistribute' config option.  From the manual:

   redistribute alert [arg...]
  A  service  may have one redistribute option, which is a special
  form of an an alert definition.  This alert will  be  called  on
  every  service  status  update,  even  sequential success status
  updates.  This can be used to integrate Mon with  another  moni-
  toring  system,  or to link together multiple Mon servers via an
  alert script that generates Mon traps.  See the ALERT PROGRAMS
  section  above  for a list of the parameters mon will pass auto-
  matically to alert programs.


Combine redistribute with trap.alert, define all your watches and
services on all servers, and then you can do lots of stuff with
dependencies.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Disable all alerting for 20 minutes

2006-12-13 Thread David Nolan


--On Thursday, December 14, 2006 00:20:40 +1030 Ben Ragg 
[EMAIL PROTECTED] wrote:

 Hi there,

 We often make changes to our network at 3am, and while every effort is
 made to disable the appropriate services, quite often something will slip
 through the cracks and wake someone up.

 Is there an option to disable all alerts from being sent for 20 minutes,
 and only display via the webpage (Failed, NoAlerts)


There are a few options right now for this.

If its a regular occurance you could configure an exclude period on the 
services, or configure the alert periods themselves to exclude that time 
frame.

If its an irregular occurance you can stop the mon scheduler via the web 
interface (or from cron), and restart when done.(The UI will see no 
updates, because nothing will be tested...)

Finally the most evil hack style method, which I've used on occasion, is:
cd mon-alert-dir
chmod -x *
... maintenance here
chmod +x *


You could also do something like write a script that uses Mon::Client and 
disables all hostgroups.  (This would show the status updates in the UI 
without sending alerts, at least with the current (CVS, 1.2.0rc1) Mon it 
would, I can't remember whether 0.99.2 did that.)


-David



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Handwritten sms.alert doesn't get executed

2006-12-01 Thread David Nolan
I sent this earlier, but I must have sent it from the wrong address
and its sitting in a moderation queue...

-David

-- Forwarded message --
From: David Nolan [EMAIL PROTECTED]
Date: Dec 1, 2006 8:27 AM
Subject: Re: Handwritten sms.alert doesn't get executed
To: mon@linux.kernel.org


On 12/1/06, Steven Schubiger [EMAIL PROTECTED] wrote:
 Hi!

 I've been quite trying for a while to get a handwritten SMS executed by mon.
 Everything is fine if I open a terminal and run the script with same
 parameters as defined in mon.cf -- the SMS gets send. When run
 by mon, nothing happens.

 Looked through the mailing list archive and found some familiar threads
 which had some interesting remarks:

 I checked if
 * permissions are right (same as for all other alerts)
 * the interpreter line was valid (same as for all other alerts)
 * no absolute path specified (same as for most other alerts)
 * perl -c sms.alertemits no warnings (same as for all other alerts)

 Furthermore, I checked whether the script runs, but obviously it
 doesn't. I've examined the syslog and the output generated
 from mon when called with the debugging flag, but they leave
 me in a rather clueless state.

 Thanks in advance,
 Steven



Steven,

Some suggestions for you:

You said no absolute path specified, did you mean no non-absolute
paths specified?  i.e. if your script runs a program named foobar from
/usr/local/bin it should be calling it as /usr/local/bin/foobar, not
assuming /usr/local/bin is in $PATH.  (Alternatively you can set $PATH
in your script...)

Try adding some debugging code in your script.  i.e. if its perl add
something like:
open(LOG, /tmp/alertlog);
...
print LOG got to step XXX\n;
...
print LOG got to step YYY\n;


When testing your alert are you also passing the other options that
Mon sets when calling an alert?  i.e. '-g group -s service -h list of
hosts here', etc...  (See the Mon man page for full documentation.)

What user do you run mon as?  Have you tested su'ing to that user and
running the script?

Can you post a copy of your script for us to look at?  (without any
SMS numbers, of course...)

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Questions about snmpmonitoring and ALERT / UPALERT

2006-10-13 Thread David Nolan


--On October 13, 2006 9:41:52 AM -0400 Bill Chmura [EMAIL PROTECTED] 
wrote:


 Hello,

 Yesterday I installed two temperature sensors in my server room.  I set
 them both for 10 degrees higher than the current.  Well, the building
 people raise the temperature up at night to save on energy.

 I do have my own cooling system in there, but it did not compensate for
 the building raising and set off the alarms.

 My threshold was for 75 degrees and the peek it went up to was 76.
 Unfortunately it paged me around 75 times last night.


Ah, I believe you've just learned the first lesson of monitoring...  Never 
enable paging on a new test/service until you've run the monitoring test 
for a while first.



 So it basically went like this:

 ALERT (temp 75.7)
 UPALERT (temp 75.3)
 ALERT (temp 75.4)
 UPALERT (temp 75.6)
 etc, etc...

 All of these are above the stated MAX limit of 75.  For some reason,
 ever other one is coming as good news - even though the temperature
 could have gone up.


 I am going to spend part of today insuring I can sleep tonight (first by
 raising the MAX temp) by solving this - but if anyone has any thoughts
 on this - i would love to hear them.


I have a suspicion of whats going on here.  I believe the current mon 
version has a feature (or bug, depending on your point of view) where the 
UPALERT summary  detail messages are actualy from the last failure, not 
from the OK test.  I suspect the temperature was actually crossing the 
threshold repeatedly, something like this:

test 1:  75.7 - ALERT 75.7
test 2: 75.3 (no alert)
test 3: 75 - UPALERT 75.3
etc...

There has been debate in the past about whether providing the 'last 
failure' content is useful for indicating what failure ended, or is 
confusing because it looks like its saying that state is OK.  I feel its 
confusing, and at CMU we're running with a patched mon that provides the 
success output during an
upalert.

I can't remember right now whether a decision was made about changing this 
behavior.  If we decided to change it, the change must have gotten missed 
during one of the big merges between Jim's alert structure rewrites and my 
behavior changes.

So, the messages you got were confusing, but the temperature was probably 
crossing your threshold repeatedly.  You might want to experiment with 
putting a longer threshold in place before you alert, i.e. 'alertafter 3'. 
Or you could de-bounce the monitor test somehow.  Maybe configure it with 
two values, a low-water mark and a high-water mark, and exit with different 
exit codes. e.g. use exit code 1 when temperature is 75-78, exit code 2 
with temperature over 78.  Then you could only send email on temperatures 
in the 75-78 range, and page on temps over 78.

-David



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: SNMP monitoring

2006-09-18 Thread David Nolan
On 9/15/06, Bill Chmura [EMAIL PROTECTED] wrote:
 Hey,

 I've been muddling my way through getting the SNMP working with Mon, and
 I am happy to report that I have had more trouble with finding the right
 MIBS than I have with getting MON to work with them.  Good job!

 Some thoughts after going through this process.  (I am running CVS from
 last week sometime).  This is all regarding snmpvar.monitor.

 * The contrib directory has up to 1.4 in it, but Mon comes with 1.6.
 Should something be noted in the contrib area that its being maintained
 in the main distribution?

The contrib directory on www.kernel.org is actually a bit out of date,
the mon-contrib area in CVS has several newer scripts.

Since snmpvar.monitor has been integrated into the primary
distribution I've removed it from the CVS contrib area (just now...).



 * The readme for it refers to having UCD SNMP installed.  I found that
 in late 2000 it changed its name to NET-SNMP.  Still works fine, but its
 easier to find in package management than UCD SNMP.  Should it be changed?


I just commited a fix to the readme.


 * The readme also instructs you to copy snmpvar.def, snmpvar.cf to your
 mon etc directory.  These are not in the main mon package.  I found them
 in the contrib tgz for the last snmpvar and used those.  They worked
 fine, but the directions should probably be updated or the files included.


The files are include in the mon package, in the etc directory.



 Anyway, I would be more than happy to put it all together and send
 someone updates they could drop into cvs.  I'd love to contribute
 something back to the project.


Feedback like this is already a useful contribution.  Not every
contribution has to come in the form of code updates... :)

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: New release?

2006-09-08 Thread David Nolan
On 9/7/06, Bill Chmura [EMAIL PROTECTED] wrote:

 If someone wants to update the tag on the mon-client so the new stuff
 that fixes mon.cgi is in, I would be more than happy to roll a few
 tarballs so there could be a new release.


I'd actually already moved the tag, but was waiting for Jim to put a
release out.

However since he hasn't gotten to it and there is clearly demand for
it, I'll at least publish a release candidate.  I've placed mon and
mon-client 1.2.0-RC1 files here for review:
http://www.managedandmonitored.net/mon/

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: rotate Downtime log

2006-09-08 Thread David Nolan
No, Mon does not currently support this format.  Using logrotate is
probably an OK approach, but I suspect that you would need to restart
Mon to get it to close the file and create a new one.  (Haven't
confirmed that, but I don't think it re-opens the file every time...)

A better answer would be to add log rotation support to Mon so that at
a rotation time it doesn't lose all knowledge of past failures.

-David

On 9/4/06, pingouin osmolateur [EMAIL PROTECTED] wrote:
 Hi everybody
 Can I use this format to rotate downtime log or
 something equal?
 logdir = /var/log/mon%YEAR-%MONTH

 Or is there an other solucion, i know i can use
 logorate.
 Thnaks in advance
 ac



  p4.vert.ukl.yahoo.com uncompressed/chunked Mon Sep  4 16:13:33 GMT 2006


 ___
 Découvrez un nouveau moyen de poser toutes vos questions quelque soit le 
 sujet !
 Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et 
 vos expériences.
 http://fr.answers.yahoo.com

 ___
 mon mailing list
 mon@linux.kernel.org
 http://linux.kernel.org/mailman/listinfo/mon



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Alert After not working for me...

2006-09-08 Thread David Nolan
On 9/8/06, Bill Chmura [EMAIL PROTECTED] wrote:
 I've recently spent a lot of time overhauling my mon.cf.  I moved to m4
 macros which I had been meaning to try (I recommend them to anyone who
 has not tried them for mon.cf).


(Note to self: I really need to put together a public release of the
system we use at CMU for maintaining our mon config file.  It's a
complete database driven web app for maintaining a large mon
config...)






 Basically, I was thinking for a few services that are touchy to have the
 system regularly test every 30 minutes.  But if it has a failure to test
 every minute.  Then issue an alert if it fails 5 times in one minute.

Is that a typo?  How can it fail 5 times in one minute if you're only
testing in every minute?

Since you didn't include a mon.cf snippet I'll have to guess a bit
here about whats going on.

I suspect you're trying to describe something like:
...
interval 30m
failure_interfal 10s
period 
  alertafter 5 1m
  


I think you're trying to use the two-argument form of alertafter in a
way other then the intent.  The two argument form is to detect
intermittent failures, i.e. 'alertafter 2 6h' would alert if a service
fails twice in six hours.  In the case of an intermittent failure a
single failure would only result in two tests at the faster test rate
before returning to the regular test rate.

For what you're describing I think you want either 'alertafter 5'
(i.e. 5 consecutive failures) or 'alertafter 50s' (i.e. alert when a
service has failed every test for 50 seconds)


-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: A question about mon.cgi and monshow

2006-09-05 Thread David Nolan
On 9/5/06, Bill Chmura [EMAIL PROTECTED] wrote:
 I installed recently the latest cvs (1-2-0) of both the monitor and of
 the client.


 MON.CGI
 ---
 I put mon.cgi in my web server, but when I run it - basically it spits
 out into the logs:

 Cannot locate object list_views via package Mon::Client at .
 mon.cgi line 2175 GENO line 1.


I would swear I committed that code to CVS at one point, but its not there.

I just committed changes to Mon::Client to provide the view related methods.

(This is a relatively new feature in mon where filtered client views
are implemented in the server, rather then having to be implemented in
every client.)



 I also have a problem with fping and unidentified output, but we talked
 about this before so I am going to go stfw and archives on that one.


I may have a different version of fping.monitor that has the code
necessary to handle this output, can you tell me exactly what extra
output from fping you're seeing?

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: CVS Access broke?

2006-09-01 Thread David Nolan


--On Thursday, August 31, 2006 12:16:25 -0400 Jim Trocki 
[EMAIL PROTECTED] wrote:

 On Thu, 31 Aug 2006, Bill Chmura wrote:

 Which version is recommended at this point?

 this should do you well:

 ftp://ftp.kernel.org/pub/software/admin/mon/devel
  mon-1.1.0pre1.tar.gz
  mon-client-1.0.0pre2.tar.gz



He really should be using at least mon-1-1-0pre3 there werw a couple 
significant bugs in pre1, and there have been a couple minor fixes since 
then.  Jim, if I tag the current code as mon-1-1-0pre4 and 
mon-client-1-1-0pre3 can you put up tarballs of both of those, and maybe of 
mon-contrib as well?  If you don't have the time I can put up images 
somewhere else.

-David


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: CVS Access broke?

2006-09-01 Thread David Nolan
On 9/1/06, Jim Trocki [EMAIL PROTECTED] wrote:

 ok sure, and i guess we should just fork it and call the branch 1.2, or 2.0.
 the head trunk we can begin calling 1.3 or 2.1, following the odd #s devel,
 even #s stable paradigm.  i can take care of that and the updates to the web
 page and other related stuff sometime within the next week.


OK, following that convention I've just tagged the current CVS as
mon-1-2-0 and mon-client-1-2-0 respectively.  I haven't created a
branch from that tag yet, but we can do so if you want.  (If we're
just going to be doing minor bug fixes for a while there probably is
no need to branch just yet.)

Please create tarballs and publish to both ftp.kernel.org and the
sourceforge files area.


-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Question on Redistribute

2006-08-25 Thread David Nolan


--On Thursday, August 24, 2006 14:54:05 -0500 Tim Carr [EMAIL PROTECTED] 
wrote:

 - Would it be possible to just send one everything is ok trap for a
 new overall check?  Maybe a new monitor script that queries itself to
 see if there are any existing problems and will alert based off that?
 - I'd also continue to send an alert per service if a new service
 problem is detected.
 - On the corporate server, I'd setup only setup one service per store
 entry that would have the traptimeout monitor (to watch for the
 network outages) but still have a service entry for each server to catch
 any of the specific service outage traps that would be received.


One scenario I can envision that would work which may be what you're trying 
to describe here is:
- Services at remote sites monitored at desired frequency (10s), traps sent 
to corporate via alert/upalert, i.e. only during failures.
- On the real services do not configure an alertevery option, so traps are 
resent every 10 seconds, in case the UDP packet is dropped.
- You probably would also want a startupalert configured here to set the 
initial status to OK on the corporate server.
- Add one fake service that always returns an OK result, run it once per 
minute and redistribute the status to corporate.  For this service only you 
would want traptimeout configured at corporate.
- Possibly add monitoring of the remote sites from the corporate server, 
including monitoring of Mon itself.


-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Flapping monitor

2006-08-24 Thread David Nolan


--On Thursday, August 24, 2006 13:58:49 +0200 Emilio Mira Alfaro 
[EMAIL PROTECTED] wrote:

 Hi list,

 I'm trying to configure mon to alert when one of our routers interfaces
 flaps 3 times during 30 secs.  I also would like mon not to send more
 than 1 alert every 30 minutes I came up with this config:

 watch mad_log_flapping
 service path_a
 description flapping on path_a
 period wd {Mon-Sun}
 alertafter 3 30s
 alertevery 30m
 #trapduration 30s
 alert mail.alert email_address

 I'm redirecting SNMP traps from the router to mon using snmptrap2mon.pl.
 The thing is that if I redirect linkUp and linkDown traps, the service
 never come down and mon never sends and alert even when there are more
 than 2 transitions (linkDown  linkUp). If only linkDown traps are
 redirected, mon sends the alerts as it should but the services is always
 down (it shows up on red on moncgi) after a flapping occurs, which
 bothers me. This is mainly because no linkUp traps are redirected. I've
 tried option trapduration 30s but on the lastet CVS release mon
 complains with unknown syntax [trapduration 30s], line 59.



trapduration is a configuration option that belongs in a service 
definition, but outside of a period definition.  The current Mon code is 
much more strict about options that are misplaced, where earlier versions 
would just ignore those options.


 I'd like to have the service on green while there is no flapping and, if
 there is flapping (3 interface transitions during 30 secs), put in on red
 during 30 min and bring it back to green if no more flapping happens.


trapduration won't quite get you this behavior.  When the trap status 
expires the service will go back to the untested state, not the OK state. 
This may be acceptable to you...

To get the behavior you describe you need to make snmptrap2mon.pl to send a 
mon trap with status OK for linkUp traps, I suspect its currently sending 
failure alerts for both linkDown and linkUp traps.

-David


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Question on Redistribute

2006-08-24 Thread David Nolan


--On Thursday, August 24, 2006 10:14:48 -0400 Jim Trocki 
[EMAIL PROTECTED] wrote:

 On Thu, 24 Aug 2006, David Nolan wrote:

 --On Thursday, August 24, 2006 08:21:16 -0500 Tim Carr
 [EMAIL PROTECTED]

 The problem is that we're going to need to turn the monitoring period
 for several of the remote site monitors in each location way up - like
 checking every 10 seconds (i.e., interval 10s).  That mean we're going
 to see a huge increase in the number of traps we're seeing at the
 corporate site.

 Or we could implement a redistributeevery option, similar to alertevery.
 That wouldn't be too hard, but would take a little work.

 yeah the issue here is the processing and communication overhead of
 dealing with the traps sent remotely. it would make sense to batch up the
 10s traps from the remote systems and send them out in a bundle say, once
 every minute, and that would, you know, save you 6x the processing
 overhead on the remote mon server, or at least give you a way to control
 the processing overhead to suit your needs.

 this use case might mean that it would make sense to move the remote trap
 stuff into the mon server itself, rather than implement it with the trap
 alert. the trap alert is a nice simple abstraction that works well for
 the simpler cases, and an elegant way of extending the functionality of
 mon without having to change the server code, but at the cost of
 efficiency. you would really want the ability to batch up only the trap
 transmissions rather than all alerts. for example, schedule a trap
 queue flush every minute performed by the mon server rather than in the
 trap alert.


I could see benefits to that capability, in addition to the current 
redistribute support.

My original idea for redistribute was that it could be used to integrate 
mon with other systems as well, because its just an arbitrary script that 
you can provide.  i.e. it could send status updates to Open View, or log 
status updates to a database, or anything else you might want.  The ability 
to use it for integration with remote mon servers is just a bonus...

 then this brings up the issue of trap processing overhead on the rx end.
 i wonder if the behavior would be acceptable by just processing the trap
 receptions serially, the way it is done now, or if it would require a
 change in processing method to scale it up efficiently.

For the record, my master server is a 2.8Ghz P4, and basically runs at zero 
load while processing the trap load I described earlier, and running a few 
tests of its own.  I'm sure there is a limit to reasonable trap load, but 
we haven't hit it yet.


 this probably requires much more thought and a better understanding of the
 usage scenario.


I agree.  I suspect Tim's usage scenario involves large numbers of servers 
sending monitoring relatively small environments, so I doubt he'll have any 
processing load problem.  But we're not quite sure of the scale of Tim's 
setup.

-David



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Question on Redistribute

2006-08-24 Thread David Nolan


--On Thursday, August 24, 2006 10:18:56 -0500 Tim Carr [EMAIL PROTECTED] 
wrote:

 4000 traps/second.  That sounds like a whole lot to me.


Holy cr** thats a lot of traps.  Wow, the interesting ways that mon gets 
deployed continue to amaze me...

Even if you were only sending one trap per minute per service you would 
have:
25 service * 1 trap/minute * 2 servers * 1200 site = 6 traps/minute, or 
1000 traps per second.

That still *lot* of traps.  Doing your bandwidth math shows that it still 
1.6Mbps of trap traffic.

I think you might want to make your mon setup more structured, with 
intermediate collection points that pass status changes only to your final 
collection point.

-David



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Problem getting traps to work correctly

2006-07-13 Thread David Nolan



--On Thursday, July 13, 2006 14:01:58 -0500 Tim Carr [EMAIL PROTECTED] 
wrote:



A question on the redistribute option, though - I'm not sure I can
follow how the configuration works.  For example, my current remote
server config is:


redistribute is a service level config option, not a period option.  For 
example:

watch Store13-2
   service DRBD_Status
   interval 15s
   monitor DRBDCheck.monitor -s you
   description Is\ DRBD\ working\ there?
   redistribute trap.alert mainmonitor


-David



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Problem getting traps to work correctly

2006-07-13 Thread David Nolan



--On Thursday, July 13, 2006 14:20:38 -0500 Tim Carr [EMAIL PROTECTED] 
wrote:



Gotcha.  I threw that in, and it seems to work correctly, except I can't
tell if it is or not.  I'm watching the log file, and it shows alerts
being sent on an up/down event, but I'm not seeing alerts every 15s
showing up when things are working correctly.  Is that expected
behavior?

Thanks,
Tim



I refer to the server that sends the traps as a slave server, and the 
server collecting the traps as the master server.  Your master server 
should receive a trap on every status update on the slave server, i.e. a 
trap every 15s in your example.  The master should only alert based on its 
alert behavior.  This makes receving updates via traps almost functionally 
equivelant to other monitor tests that you run on your master server.


If thats not the behavior you're seeing please let me know.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Problem getting traps to work correctly

2006-07-13 Thread David Nolan
On 7/13/06, Tim Carr [EMAIL PROTECTED] wrote:
 Here's a bit more information on it.  I've got the slave server
 configured for multiple services, each of them using the redistribute
 option:

redistribute alert trap.alert mainmonitor


If thats an exact quote you've got the option wrong.  Its just
redistribute trap.alert mainmonitor.

 On the master server, once I've reset it, none of those servers will
 ever go green/good in mon.cgi - they stay in blue/unchecked status.


That sounds like you've still got the period based trap configuration
in place.  (Which would match with the above typo.)

If thats not true, and the line above was a typo in the email not the
configuration, then maybe the redistribute code in CVS is broken.
Before I go investigate that possibility please confirm whether the
line above was an exact quote from your config file.

 In the slave server, the history file shows this for an outage event:

 alert Store13-2 DRBD_Status 1152819579 /opt/mon/alert.d/trap.alert
 (mainmonitor) DRBD_Not_Running
 upalert Store13-2 DRBD_Status 1152819594 /opt/mon/alert.d/trap.alert
 (mainmonitor) DRBD_Not_Running


This also indicates to me that your old alert/upalert configuration is
still in place, because redistribute does not generate history
entries, because doing so would bloat the history file on the slave
server.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon-alert-script won't execute

2006-07-05 Thread David Nolan



--On Wednesday, July 05, 2006 13:46:44 +0200 Felix Leiter 
[EMAIL PROTECTED] wrote:



mon recognizes at 03:29:37  03:29:47 that the port 8080 is closed and
calls the squid.alert at 03:29:47 but then nothing happens. I don't now
where the misconfiguration is.

I try to change the squid.alert-script to this:

# !/bin/sh
#
#
/etc/init.d/squid start

the acl for squid.alert is set to 755, this should also be alright.

does anyone has any sugestions?

kind regards


Felix,

What user is mon running as?  If its not running as root it probably cannot 
restart squid.


What OS are you running.  Is the squid init script refusing to start squid 
because it thinks its already running?  Have you tried running your script 
by hand and seeing whether that restarts squid?


In your message the first line of the script is '# !/bin/sh', while it 
should be '#!/bin/sh'.  i.e. no space between # and !.  But that might have 
just been a typo in your email...


-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pager alert continues to page multiple times after single failure

2006-04-20 Thread David Nolan



--On Thursday, April 20, 2006 09:10:53 -0400 Brendan Mullen 
[EMAIL PROTECTED] wrote:



I was migrating our instance of Mon to a new machine running a newer
version.

The problem was traced to a locally modified version of the qpage alert.
If I had used the qpage.alert that shipped with the version of Mon I was
using,  I would have been fine.

The locally modified qpage.alert worked on an older version of Mon, but
not 1.0pre5   The page would be sent but never show up in the alert
history, and then would be sent again.  and again...


Interesting.  Can you explain the cause of the failure?  Was qpage.alert 
exiting with an error code that made Mon think it need to re-try the alert?



-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pager alert continues to page multiple times after single failure

2006-04-19 Thread David Nolan



--On Wednesday, April 19, 2006 12:09:30 -0400 Brendan Mullen 
[EMAIL PROTECTED] wrote:



Hello,

I'm using mon-1.0.0pre5, the mon-client-1.0.0pre5 and mon.cgi2.2

When an alert is triggered, it continues to page every minute like it is
paging on every watch  interval.




The config looks fine.  Is your monitor's summary line changing?  (That 
would cause a re-alert).


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: monitoring parameters

2006-02-23 Thread David Nolan



--On Wednesday, February 22, 2006 16:46:59 -0600 Nate Reed 
[EMAIL PROTECTED] wrote:



I'm not sure if I have set the monitoring parameters correctly for what I
want  to do.

First question: is the monitoring interval the frequency that mon runs
the  monitor, or does it define something else?



I hate to quote the documentation, but from the manual:
interval timeval
   The keyword interval followed by a time value specifies the frequency 
that a monitor script will be triggered.


So 'interval 30s' means that mon will run the monitor test every 30 seconds.


It seems like MON is forgetting about the previous alert after the
monitoring interval has elapsed (MON_FIRST_FAILURE and MON_LAST_FAILURE
are  equal even though there were numerous failures).  Is that what's
supposed to  happen?



First and last failure should be the same in certain cases, depending on 
how long the failure has been happening.  first failure is an indication of 
when the current failure started, last failure is an indication of when the 
most recent monitor test was run.  So if your interval is 5 minutes, for 
the five minutes immediately following the first detection of a failure 
first and last will be the same.




Ideally, my monitor would run very frequently (every few seconds), but
the  monitoring interval would be longer, like 30 minutes.  Upon on a
second  failure during the monitoring interval, my alert script will try
to take a  different action than on the first failure.  Is this possible
through Mon's  configuration (without building this logic in my script)?



You can do this.  The interval setting configures the testing behavior, the 
alert period definitions configure the alerts (actions) that will occur. 
You can have multiple periods with different behaviors for different 
failure lengths or different times of day.


For example, look at these two periods:

period first_action: wd{Sun-Sat}
 alertafter 1
 alert some.alert.script -some -arguments
 numalerts 1
period second_action: wd{Sun-Sat}
 alertafter 30m
 alert some.other.alert.script -some -arguments
 alertevery 30m


Those would run some.alert.script immediately whenever a failure occurs, 
and some.other.alert.script after the failure has been continous for half 
an hour and every half hour after that.



See the manual for full information on all the alert control semantics that 
are available.


-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: capture ip of falling node

2005-11-29 Thread David Nolan



--On Tuesday, November 29, 2005 10:22:33 +0100 Andrés Cañada 
[EMAIL PROTECTED] wrote:



Hi! I posted this message in another list and I think this is the right
place  for it.


Andrés Cañada wrote:
 Hi all!
 I have a cluster working with heartbeat and ldirectord, systemimager,
 ganglia and mon. It's working nice already (thanks to everybody in this
 list!). I use Mon to monitor the cluster nodes with snmpd.
 When one of the criteria is positive, then Mon sends me an alert to my
 mail. That's great!!
 But now I'd like to be able to capture that sign sended by Mon to run a
 script. I don't know if I'm explaining well. When ,in example, a node
 fails to a ping-check, I'll receive an e-mail notification, but I'd
 like also to be able to capture this signal to run a script.
 Can anybody tell me if snmptrapd is ideal for this issue to solve?
 Is there a HOWTO for this?

 thank you very much and sorry for my english.
 Andres

Why don't you just write your own alert script for Mon and
have Mon run it?



Thanks for your answer. I'd like it to be so easy but I'm afraid it
isn't. In  my case I need to know the ip of the falling node and then
trigger a script  that makes something with the rest of the nodes (I need
to modify the setup a  mpi universe).
It seems to be possible to do this since the ip of the falling node is
received via mail.
Any ideas?
Should I need to use snmptrapd??
Thank you very much
Andrés.


I think you missed the point of the previous response.  The mail you're 
getting from Mon is being generated by an alert script.  If you think there 
is enough information in that mail to take an action on the failing node, 
then there is enough information available to an alert script to just have 
the script take the action.


Alert scripts in Mon are simply programs which take information in specific 
ways from Mon and perform some form of action on that data.  They can be 
shell scripts, perl programs, C programs, etc.  They can take whatever 
action you deem reasonable.  The command line arguments and environment 
variables that are passed to alert scripts are documented in the Mon man 
page.



-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: alert does not execute script compleet

2005-11-26 Thread David Nolan



--On Saturday, November 26, 2005 4:51 PM +0100 gandalf istari 
[EMAIL PROTECTED] wrote:



Hi,

I have a problem with a self written alert.
This script must change two routing tables, exec a ssh command and write
ta text to a log file. It does everything execept the ssh command.
If i run the alert manualy all work perfectly. this script is crutial in
our failover setup.


snip


# Change route at UCC to framerelay
ssh [EMAIL PROTECTED] /usr/lib/mon/alert.d/use-framerelay.alert



If I were a betting man, I'd bet money that ssh isn't in the PATH of the 
user that Mon is running as.  Try adding the full path to your ssh binary.


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Failure do not become Alert !

2005-11-11 Thread David Nolan



--On Friday, November 11, 2005 10:47:39 +0100 GioiaBa [EMAIL PROTECTED] 
wrote:

SNIP

   period wd {Sat-Sund}

SNIP

last day, the Router service went down, so 'Ext' watching began to fail..
the problem is that the failure leght has been 1h 39 mintues !! and never
became Alert.. so no Alert has been sent for that hour..  this would be a
great problem, as the service we are monitoring is our Router
connectivity..  any ideas on the reason why this could happen ?


Was the failure on a Saturday or Sunday?  Your period definition is for 
weekends only.  Perhaps you want 'period wd {Sun-Sat}'.  Or simply an empty 
period definition will match always.




..and we also need to monitor the responding time of the service..
I mean the service 'fails' only if the fping did not respond in xxminutes
.. I've read before how to do it, but I can't find it right now..  Any
help would be appreciated..
thank you very much


I'm not sure exactly what you're asking.  If you want to control the 
detection behavior of a single service test, look at the command line 
options for the monitor scripts you're using.  For example, fping.monitor 
takes the command line options that you can use to control the ping timeout 
behavior:

   -r num  retry num times for each host before reporting failure
   -s num  consider hosts which respond in over num msecs failures
   -t num  wait num msecs before sending retries

-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Alerts coming too often... why?

2005-11-03 Thread David Nolan



--On Thursday, November 03, 2005 12:29:55 -0500 Bill [EMAIL PROTECTED] 
wrote:



I have it set to alert every 60minutes, but I get them about every 5
minutes.  In reading the doc's I noticed the results have to be the
same for each entry otherwise it resents.  I noticed the fping entries
in my log are different.


Yes, the default behavior is that if the summary of the failure changes a 
new alert should be generated.  If you're running the current Mon from CVS 
you can control that by saying 'alertevery 60m strict'.


Alternatively you could figure out what your alert is generating 
inconsistent output.  Based on this string from your syslog output, 
unidentified output from fping, I'm guessing your alert script isn't 
corretly processing all of the fping output.  I believe you might need a 
newer version of the fping.monitor script.  If the latest version from CVS 
doesn't help send us the version iformation for you version of fping and 
we'll see if we can fix it.


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Alerts coming too often... why?

2005-11-03 Thread David Nolan



--On Thursday, November 03, 2005 13:49:15 -0500 Bill [EMAIL PROTECTED] 
wrote:



So is the cvs relatively stable?  Mon is not mission critical stuff
here, so I'd be more than happy to run that on a bunch of machines.

Right now I am on 0.99.2

I was eyeing CVS the other day...  debating it.



CVS is definitely more stable then 0.99.2.  0.99.2 has some nasty bugs, 
including some crash and burn type bugs.


I need to spend some time integrating some last bug fixes to CVS and then 
we're ready to call it a release.


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Main Groups?

2005-10-26 Thread David Nolan



--On Thursday, October 27, 2005 6:57 AM +0200 Frank 'eXplasm' Isemann 
[EMAIL PROTECTED] wrote:



and for example on s3 doesnt run a ftp server .. how can i exclude the
ftp service from this special server?




From the service definitions portion of the documentation:


exclude_hosts host [host...]
   Any hosts listed after exclude_hosts will be excluded from the service 
check.


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: A few bugfixes missing from fantabulous mon...

2005-10-21 Thread David Nolan



--On Friday, October 21, 2005 16:52:04 -0400 Ed Ravin [EMAIL PROTECTED] 
wrote:



Oh, but just ignore that last patch set, that's totally the wrong one.
I'm surprised no one tweaked me on it - maybe no one ever reads my mail
all the way down to the bottom?



I read it all, but hadn't gotten around to looking at the patches yet.  (I 
guess I was hoping Jim would.  He was probably hoping I would...  :)


I'll try to look at them this weekend.

-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: how does exclude_period work?

2005-10-11 Thread David Nolan



--On Tuesday, October 11, 2005 17:24:27 +0200 Sebastiaan Veldhuisen 
[EMAIL PROTECTED] wrote:



Hi David,


I just committed a new version of Mon to CVS with *UNTESTED* support for
a  global exlude_period.  Download the latest from the sourceforge CVS
repository and put 'exclude_period = wd {Mon} md {8-14} hr {17-23}' into
your config file, next to the other global settings.  (You'll also  need
the current version of Mon::Client, since there were some protocol
changes between your version (0.99.2) and the 1.1 series.


That's great news! Thanks for the enhancement :) I'll test it in the
next days, but unfortunately i'm not allowed to use my own compiled code
in production machines. I'll have to wait until Suse updates its rpms.




Sorry to hear your employer ties your hands like that.  0.99.2 has some 
serious problems, including some that can trigger a perl bug that results 
in a perl segfault.




You're trying to put the exclude_period definition inside a period.  Put
it  above the first period definition and it should work.  (And in
current Mon  code this would generate a config file syntax error.)


I putted exclude_period above all other period definitions and now i get
a syntax error and mon won't start. What do you mean with current? Do
you mean CVS?  Right now I'm using version 0.99.2. Does it mean it is
not possible to use an exclude period with my mon version?



I don't remember whether the 1.0pre* series has this fix, but in 1.1* a 
unrecognized option in the period section will result in an error.


Can you post a snippet of your current config and the resulting error 
message?


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: how does exclude_period work?

2005-10-11 Thread David Nolan



--On Tuesday, October 11, 2005 22:32:51 +0200 [EMAIL PROTECTED] wrote:


Quoting David Nolan :


Sorry to hear your employer ties your hands like that.  0.99.2 has

some

serious problems, including some that can trigger a perl bug that

results

in a perl segfault.


   Yeah I know. It's not that i'm not capable of it. The company chose
to stick to an Enterpise Linux version so they can get support
tickets from Suse on the software. Good news is, that this is my last
month working for them :o) I'll compile CVS on my Debian Sarge machine
and test your enhancement. I know about the problems with 0.99.2, but
so far (I'm lucky I guess) I haven't had any problems with
segfaulting.


The segfault bug is trigger by calling a text parsing function (from a 
standard perl module, Text::Parsewords) with particulary large input.  The 
ways I've seen this triggered are parsing monitor output and parsing trap 
input.  I'd bet money it could probably be triggered by a large client 
request, but I just fixed the problem by not using that routine any more.




   cf error: unknown syntax [exclude_period wd {Mon} md {1-7} hr
{17-22}], line 69




Oh shoot... Now that I go look at the code to find where that comes from I 
remember that 0.99.2 had a complete parsing bug on exclude_periods that 
prevented them from ever working.  Basically this code:
   elsif ($var eq exclude_period  inPeriod (time, $args) 
== -1)

   {
   close (CFG);
   return cf error: malformed exclude_period '$args' (the 
specified time period is not valid as per Time::Period::inPeriod), line 
$line_num;

   }

needs to become this code:
elsif ($var eq exclude_period)
   {
if (inPeriod (time, $args) == -1)
{
close (CFG);
		return cf error: malformed exclude_period '$args' (the specified 
time period is not valid as per Time::Period::inPeriod), line $line_num;

}
}


the previous code was always falling through to the else clause.

Jim was talking with me recently about actually designating something a 
stable version...  This seems like one more big reason to stop calling 
0.99.2 the stable version.  How about it Jim?  Call mon-1-1-0pre2 Mon 1.1 
and cut a release?


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: not able to send sms alert

2005-10-05 Thread David Nolan



--On Wednesday, October 05, 2005 2:39 PM +0530 ankush grover 
[EMAIL PROTECTED] wrote:



hey friends,

I am trying to configure sms alerts for my servers.But I am getting the
errors

calling alert sms.alert for apache2/HTTP
(/usr/lib/mon/alert.d/sms.alert,my number) 192.168.1.68
http://192.168.1.68
Oct 5 14:28:42 linux mon[6664]: could not exec alert
/usr/lib/mon/alert.d/sms.alert: No such file or directory



Either the file /usr/lib/mon/alert.d/sms.alert doesn't exist, its not 
executable, or the binary referenced in the first line (/usr/bin/perl) 
doesn't exist.  If you try to run the alert by hand you should see the same 
error.


However I suspect you have a bigger problem.  I suspect you have not yet 
read the README for sms.alert and realized that sms.alert requires having 
gnokii installed, and a Nokia cell phone connected to your computer.


You probably want to look at some other form of SMS transmission.  We use 
snpp.alert to talk to a SNPP server that dials a modem and sends a message 
via a TAPS/IXO dialup message transmission interface.  However many phone 
providers don't offer that service anymore, so we often use email to the 
various cellular provider's email/sms gateways.  i.e. [EMAIL PROTECTED], 
etc.


Personally I don't feel that relying on SMS messaging for mission critical 
notifications is a good idea.  We *primarily* use text messaging with 
SkyTel pagers, since SkyTel actually provides a reliable messaging service. 
Every cellular provider that I've check says their text messaging service 
is for 'entertainment purposes only'.  i.e. not reliable for business 
purposes.


-David


David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: [Solaris 9] log output to terminal

2005-10-03 Thread David Nolan



--On Monday, October 03, 2005 11:54:40 +0200 Alexandre Pashai 
[EMAIL PROTECTED] wrote:



hi all,

mon daemon outputs logs into logfile (normally).
On Solaris 9, the log is sent to other terminals...that's annoying.

what's the matter ??

thanks fro replies



Mon is just using syslog.  Either you have mon syslog'ing to a facility 
that gets re-broadcast everywhere, or you have a syslog.conf that is 
sending too much information to the user terminals.



From the mon manual:


syslog_facility = facility
   Specifies the syslog facility used for logging. daemon is the default.

Check your syslog.conf to see how logs from daemon are configured.

-David


David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: failed logins monitor

2005-09-26 Thread David Nolan



--On Monday, September 26, 2005 13:58:01 +0200 Administrator Chat-Net 
[EMAIL PROTECTED] wrote:



hi all,

on the webpage of intrusion[1] i saw that they have a login_failure
monitor. is that monitor still avalaible or is there another who does
replace it?

thx for reply

greetz

[1] http://www.intrusion.com/knowledge/article.aspx?ID=611166



My impression from reading that site is that the monitor scripts reference 
are proprietary scripts written by Intrustion Inc., provided as part of the 
SecureNet Sensor product they sell.


I'd guess that their script wouldn't be useful outside of their box anyway, 
since it probably is looking at pre-collected data from their system.


For a general purpose monitor script you'd probably want something that 
parses syslog output.  There is a syslog.monitor included with mon that 
serves as a syslogd replacement, but I've never personally used it.  (I 
didn't like the 'must  replace syslogd' requirement..)


I have a similar tool which watches the syslog log files and pattern 
matches on the output, generating mon traps as necessary.  I could probably 
add it to the mon CVS area if anyone is interested in using it...


-David


David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon configuration

2005-09-08 Thread David Nolan



--On Thursday, September 08, 2005 10:59 AM -0400 Allan Wind 
[EMAIL PROTECTED] wrote:



On 2005-09-08T16:07:16+0200, Graf László wrote:

I am using a shell script wich runs in the background.  How should I
configure mon to alert me if the process hangs up or fail in
operations?


If you mean process dying with hangs up then you could use the
ps.monitor that is in contrib.  If you mean stop working as expected or
hangs then it is a unresolved problem in general, and you need to look
into making decisions based on timeouts.  For instance have your script
touch a file then write a monitor to alert if you if that file is too
old.  Should you mean a signal perhaps you want to trap that?


/Allan



In addition to Allan's suggestions I would also suggest looking into Mon 
traps.  If your long lived process runs a program that generates a Mon trap 
every X period of time (1 minute, 10 seconds, whatever...) then you could 
have a trap timeout in Mon to detect when it stopped generating traps.


-David


David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: monitor for ssl services?

2005-08-20 Thread David Nolan



--On Saturday, August 20, 2005 2:41 PM +0200 Miolinux [EMAIL PROTECTED] 
wrote:



Hi, i searched and googled quite, but i didn't find a monitor for
monitoring ssl services (i needed mail server one's: smtps,imaps,pop3s)
[SSL not TLS] and didn't want to create an ssl tunnel for each one of
them, so i modified tcpch.monitor and merget it with some parte of an
imap-ssl.monitor that i found.

Now are some weeks that i run it and seems to work, but i bet i made
some error since i'm not a perl programmer.

However since may be interesting (and someone could take a look at it)
i'll attach the code.

Ps. if someone does know an alternative to this code don't hesitate to
talk! ;)




I guess I never got around to adding CMU's imap tests to the contrib area. 
I've done that now, at least in CVS.  As soon as the public copy of the 
sourceforge CVS repository is updated you'll find a imap directory visible 
here http://cvs.sourceforge.net/viewcvs.py/mon/mon-contrib/monitors/ 
which will contain three tests, one for IMAP over SSL, one for IMAP with 
STARTTLS, and one for plain text password authentication over IMAP.  (The 
PTP test has support for a new monitor-auth.cf file to specify username and 
password, but I haven't added the documentation for that to the Mon 
repository yet.  I'll work on that.  It also can take user/password on the 
commandline.)


The IMAP over SSL test has support for alerting when an SSL certificate is 
expired, or about to expire.  We run two services with this test on our 
servers, one without certification notification, and one with.  The one 
with certificate notification enabled is configured never to page, it just 
sends mail 10 days before the cert is going to expire.


I think I'll go look and see what other monitors I have now that I should 
export...  I've probably got a dozen or so to add.  (Plus the docs for the 
monitor-auth.cf syntax...)



-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon writes to /var/log/messages

2005-08-17 Thread David Nolan



--On Wednesday, August 17, 2005 1:08 PM +0200 Grames Gernot 
[EMAIL PROTECTED] wrote:



Hi,

i found out that the mon writes a lot of messages to the
var/log/messages file during monitoring.

How can i stop this?? It fills out my harddisk!

Thank you!




By reading the documentation for mon and syslog, and picking a 
configuration which suits your needs.  By default mon logs to the 'daemon' 
syslog facility, and logs various messages at the debug, info, notice, 
alert, err, and crit syslog levels.  I suspect you're logging daemon.info 
and higher messages to your messages file.  Either log only higher level 
messages, or change mon to log to a facility you don't output to disk.


Alternatively, use any of the miriad systems available that perform logfile 
rotation, so you don't keep your syslogs forever.


If you *really* want no syslog output at all, modify the code to add that 
feature as an option and send us a patch.


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Problem with depedencies

2005-07-28 Thread David Nolan



--On Thursday, July 28, 2005 11:37 AM +0200 \rueh hänä\ [EMAIL PROTECTED] 
wrote:



Is something wrong with my dependencies? Or is it not possible to make
more than one service depending on another service? And, are dependencies
over different hostgroups possible?


I think the problem is that you're expecting more from dependencies then 
they provide.  Assuming you have dependency behavior set to 'm' what will 
happen is that test X won't be run if test Y has already detected a 
failure.  But if the failure occurs *between* when Y was last run and when 
X is run then X will detect the failure first.


The answer to this is to have virtually every service have at least an 
'alertafter 2' setting, so that two consecutive failures have to be 
detected, and have the higher-order tests have shorter test intervals. 
i.e.  for my web servers I have mon configured to ping them every 30 
seconds, check their load average via snmp every 45 seconds, test http 
every minute, and test https every five minutes.  (And the router between 
my monitoring host and the web server is pinged every 15 seconds...)


-David


David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: question concerning monitor scripts

2005-07-18 Thread David Nolan



--On Monday, July 18, 2005 11:21 AM +0200 \rueh hänä\ [EMAIL PROTECTED] 
wrote:



Any hints to this ? Or is a bash-script based monitor possible, too? How
would this work?


Mon's monitor programs can be any executable format you choose.  We have 
some that are actually compiled C code.  If you're most familiar with shell 
scripts, write a shell script.  It should behave the way the documentation 
says a monitoring program should behave.


The relevant passages from the documentations are:
Monitor processes are invoked with the arguments specified in the 
configuration file, appended by the hosts from the applicable host group.


 should return an exit status of 0 if it completed successfully (found 
no problems), or nonzero if a problem was detected. The first line of 
output from the monitor script has a special meaning: it is used as a brief 
summary of the exact failure which was detected, and is passed to the alert 
program. All remaining output is also passed to the alert program, but it 
has no required interpretation.



-David

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon.cf features like redistribute ?

2005-07-07 Thread David Nolan



--On Tuesday, July 05, 2005 3:16 PM +0200 Jacques Klein [EMAIL PROTECTED] 
wrote:



Hello,

I downloaded mon-1.1.0pre1.tar.gz, made some experiments with it and now
I am looking for an up-to-date documentation of this version,
essentially a good description of the mon.cf syntax and
the maybe new feature redistribute.




The lack of documentation is why its 1.1pre instead of 1.1.  :)

I'll try to work on the documentation this weekend.

For starters, I'll add this section:
redistribute alert [arg...]
 A service may have one redistribute option, which is a special form of an
 an alert definition.  This alert will be called on every service status
 update, even sequential success status updates.  This can be used to
 integrate Mon with another monitoring system, or to link together multiple
 Mon servers via an alert script that generates Mon traps.  See the ALERT
 PROGRAMS section above for a list of the parameters mon will pass
 automatically to alert programs.

-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Help !!!

2005-07-07 Thread David Nolan



--On Wednesday, July 06, 2005 10:04 AM +0800 D K [EMAIL PROTECTED] wrote:


hi,
  I am a Chinese, my english is poor, so I hope you can understand
this letter.
   Yesterday I use Mon to monitor a server, I hope Mon can alert via xmpp
protocol,
   so I wrote a alert file, but mon seem not work. If I execute alert
file, it canalert to my jabber. Now I hope mon can monitor a services
is down, it can alert
   to my jabber. Please Help me! I wait your reply! Thanks!!!

 A Helper
 2005.7.6




Your english isn't too bad, but your problem reporting skills definitely 
need some work.  In order to be able to help you, we need to know in what 
way mon isn't working.  What did you do, what behavior did you expect, and 
what behavior did mon show?


Is mon detecting your failure and not calling the alert, or calling the 
alert but it fails to behave as desired?  Or is mon not detecting the 
failure at all.


If your script works when you run it but not when Mon runs it, the most 
likely causes are:


- $PATH differences (i.e. your script is running some program that appears 
in your PATH but not in the PATH that mon provides to the alert script)


- privilege difference.  i.e you ran your test as root but Mon is running 
as nobody, or similar.


We need more information in order to provide any better guidance for 
solving your problem.


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: wrapping long lines in mon.cf, how?

2005-06-28 Thread David Nolan



--On Tuesday, June 28, 2005 4:39 PM -0500 [EMAIL PROTECTED] wrote:




Can long lines in mon.cf be gracefully wrapped, as:

hostgroup testing_server_network 172.16.0.1 172.16.0.2 172.16.0.3 \
 172.16.0.4 172.16.0.5


or does this mess things up?





Is this possible in mon.cf:


hostgroup My Server Group

watch My Server Group

...






From the man page:
Lines are parsed as they are read. Long lines may be continued by ending 
them with a backslash (\). If a line is continued, then the backslash, 
the trailing whitespace after the backslash, and the leading whitespace of 
the following line are removed. The end result is assembled into a single 
line.


Also from the man page:
Hostgroup entries begin with the keyword hostgroup, and are followed by a 
hostgroup tag and one or more hostnames or IP addresses, separated by 
whitespace. The hostgroup tag must be composed of alphanumeric characters, 
a dash (-), a period (.), or an underscore (_). Non-blank lines 
following the first hostgroup line are interpreted as more hostnames. The 
hostgroup definition ends with a blank line.


And:
Watch entries begin with a line that starts with the keyword watch, 
followed by whitespace and a single word which normally refers to a 
pre-defined hostgroup. If the second word is not recognized as a hostgroup 
tag, a new hostgroup is created whose tag is that word, and that word is 
its only member.


-David



David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: libperl.so.1 failed

2005-06-09 Thread David Nolan



--On Thursday, June 09, 2005 4:22 PM -0400 Kishore Jalleda 
[EMAIL PROTECTED] wrote:



Hi David,
  Thanks for the reply, actually perl mon, works fine
but not ./mon doesn't ?
Kishore



(Lets keep this on the mailing list, so others can follow along. 
Especially since I'm going on vacation tomorrow... :)


Sounds like you've got two perl installations on your machine, and the one 
that mon is using isn't the one thats in your PATH.  Compare the output of 
'which perl' and 'head -1 mon'


-David


David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon user???

2005-06-08 Thread David Nolan



--On Wednesday, June 08, 2005 9:41 AM +0200 Sylvain Clerc 
[EMAIL PROTECTED] wrote:



Hello,

I would know if a special user for Mon is created during the
installation because Mon hasn't permissions to execute my alert script
(start or stop Heartbeat) and I want to try using sudo for resolve my
problem.



Mon runs as whatever user you chose to run it as.  Some places run it as 
root, some places run it as another user.


Try running 'ps waux | grep mon' to find what user its running as.

-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: libperl.so.1 failed

2005-06-08 Thread David Nolan



--On Wednesday, June 08, 2005 3:36 PM -0400 Kishore Jalleda 
[EMAIL PROTECTED] wrote:



Hi,
 I am tring to install mon on Solaris8/Sparc , Perl version
installed is 5.8.5,I also installed all the perl modules required for
mon, when I try to run mon, or test any of the monitors , i get an
error,
ld.so.1: mon: fatal: libperl.so.1: open failed: no such file or
directory

, there is no libperl.so.1 on  the system,  am i missing something
...Please suggest




Do you have a working perl installation?  The error you are seeing is a 
linker error that implies that your perl installation may be broken.  Try 
running some other perl scripts.


-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: no monitor found while trying to run []

2005-04-28 Thread David Nolan

--On Thursday, April 28, 2005 12:03 PM -0400 george young gry@ll.mit.edu 
wrote:

I assume I've somehow specified a null-named monitor in the config file,
but I can't find the problem.  Could someone take a look?

Just guessing here, but try removing the blank line from the routers:fping 
service entry.

Also, you can run mon with debugging enabled, via '-d', to get more status 
information which might help track the problem.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: trap received but not acted upon.

2005-04-09 Thread David Nolan

--On Friday, April 08, 2005 5:47 PM -0700 Jim Trocki [EMAIL PROTECTED] 
wrote:

you need to use a valid period definition, i.e. something that is
meaningful to Time::Period, such as wd {Sun-Sat}. try this:
I don't think thats his problem.  An empty period definition is valid, it 
matches always.  Mon handles this correctly.

The problem is here:
   opstatus = unknown,
If he's using Mon 0.99.2 (which he is, the particular error message he 
reported doesn't exist in the current code), that will cause exactly this 
error.  If he's using either the latest 1.0 or 1.1 pre release, that will 
just be ignored completely, as the newer common process_event subroutine 
complete ignores this tag and only processes the return value.

Hans, I suggest you should set this to either 'ok' or 'fail', depending on 
the trap you're processing.  Or just upgrade to a newer mon and be happier. 
:)

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitor output size limitation

2005-04-09 Thread David Nolan

--On Friday, April 08, 2005 6:33 PM -0700 Jim Trocki [EMAIL PROTECTED] 
wrote:

On Fri, 8 Apr 2005, David Nolan wrote:
This is a known bug with some regexps in perl's Text::ParseWords that is
tickled by large input from mon.
well it's not really a bug, it's just that the default stack size is
inadequate for regexps in that module. bump up the stack allocation with
uname -s and you'll see the problem vanishes.
Ahh, that's what it was.  Back when I was seeing this problem I just 
remember seeing reports that it was a 'regexp bug', and didn't bother to 
track it down.  Still, you'd think perl could do a better job of detecting 
the approaching stack size limit and throwing an error in that case instead 
of segfaulting.

but it's better to have
fixed the glitch with changing the code than expecting that people run
with a modified stack size :)

True.  Even when Mon didn't segfault, the performance of those regexps on 
significant amounts of data was sometimes horrible.  I had occasions where 
mon.cgi's parsing of the opstatus output was taking a minute or more. 
Changing the encoding so a simple split could be used instead made my mon 
interface load almost instantly.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: trap received but not acted upon.

2005-04-09 Thread David Nolan

--On Saturday, April 09, 2005 6:54 AM -0700 Jim Trocki [EMAIL PROTECTED] 
wrote:

I don't think thats his problem.  An empty period definition is valid,
it  matches always.  Mon handles this correctly.
oh. that's busted. i never realized that was the case, nor intended it
to be so.  just reviewed the pod page for Time::Period, and i see it
does say mention that a valid period string is whitespace, but it
doesn't say what it means. from testing the code it does return true
when you give it an empty period string. i'm inclined to make mon treat
the empty string as an error, since its meaning is ambiguous according
to the documentation, and on principle.
In the documentation for Time::Period, right after it says whitespace or 
the string 'none' are legal it says:

If the period is blank, then any time period is assumed because the
time period has not been restricted.  In that case, inPeriod returns 1.
If the period is none, then no time period applies and inPeriod
returns 0.
So this seems like documented behavior to me.  Though none doesn't really 
make sense to ever use.  I suppose it would be useful as a way to 
temporarily disable a period, without deleting all the contents.  But it 
seems useful to be able to have multiple named periods that always match by 
doing:

 period page_first:
 ...
 period page_second:
 ...
 period email_log:
 ...
Not that I'm doing that, since my periods are all programaticaly generated. 
But if you're building a config file by hand having to specify a period 
definition for something you want to always match seems silly.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: building mon

2005-04-07 Thread David Nolan

--On Thursday, April 07, 2005 4:51 PM -0500 Armand Pirvu (yahoo) 
[EMAIL PROTECTED] wrote:

Hi ,
I tried to build mon and there are a couple of things.
Mon is a perl script.  You don't really *build* perl scripts so much as 
install and run them.  Copy the mon program to the location of your choice, 
give it a config file and run it.


1. Do I need SNMP ?
For Mon itself, no.  Depending on which monitor scripts you want to run, 
maybe.  Which monitor plugins you run will determine what dependencies you 
have.


2. What about Period.pm ? What is that for, where should it be ?
You do need the Time::Period and Time::HiRes perl modules.  Providing 
complete details of how to install these modules is really outside of the 
scope of the Mon documentation, but in most cases you can probably install 
them via CPAN.  Try running 'perl -MCPAN -e shell'.  You may be prompted to 
configure CPAN if you haven't used the CPAN module before.  Its probably 
safe to just say 'no' at that prompt.  When you get to the 'cpan' prompt 
type 'install Time::Period' and when that completes type 'install 
Time::HiRes'.  If that doesn't work for you then you have a non-standard 
perl setup on your machine.

You probably also want Mon::Client, also available from CPAN, or for 
download from the mon download site.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Mon Periods

2005-03-31 Thread David Nolan

--On Thursday, March 31, 2005 7:52 AM -0800 Chad Sobotka 
[EMAIL PROTECTED] wrote:


I have tested this out and sometimes it does work.  I bring a service
down, go to the web interface, and it reports Failed (No Alerts Sent).
However, most of the time I get an alert.  I have also tried setting the
first period to just period: instead of period p1:.

At what time of day were you doing your tests?  And how long did you leave 
the service down?  I assume you believe it wasn't long enough for your 
'alertafter 2' to be triggered.

If you change the alerts in two two periods to go to different addresses, 
which one is firing?

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: problems with period RESTART: and RESTART_FAILED

2005-03-29 Thread David Nolan

--On Tuesday, March 29, 2005 11:25 AM +0200 Anquijix Schiptara 
[EMAIL PROTECTED] wrote:

if i start heartbeat, all services get up without any problems. now i
want to test mon, if it tries to restart the httpd-service, if i stop it.
mon sees, that the service isnt running anymore, but it automatically
calls the bring-ha-down.alert script in RESTART_FAILED period instead of
the restart-httpd script in the RESTART period.
if i comment out the RESTART_FAILED entries, it works with restarting the
service.
the funny thing is, this configuration worked the first time i used it,
but not the next few times. and i got the examples from a linux-magazine,
which should work.

You have two periods defined.  Neither period has an alertafter entry, so 
*BOTH* alerts will be called when a failure occurs.  Which one is run first 
is random chance (probably based on the random order from a hash table key 
lookup.)

If you want one to be called before the other you should put alertafter 
definitions in both periods.  I suggest something like:

period ATTEMPT_RESTART_FIRST:
 alert httpd_restart.alert
 alertafter 2
 alertevery 30s
period RESTART_FAILED:
 alert bring-ha-down.alert ...
 alertafter 1m
 alertevery 1m
You also might want an upalert entry in the second period that would bring 
the heartbeat service back up.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: ORing hosts in a hostgroup (instead of ANDing) for a monitor

2005-03-26 Thread David Nolan

--On Friday, March 25, 2005 10:57 PM -0300 Raul Dias [EMAIL PROTECTED] 
wrote:

Sorry if this is covered in the docs and I missed.
Is it possible to have an monitor to OR the hosts in a hostgroup and if
one SUCCEED the service is considered SUCCESS?
An example for this is to have a hostgroup with a few internet hosts
and fping them.  If one of them succeeds then the internet conection is
ok.  Some may fail and the conection still be ok.
However if all of them fails, then the internet conection is supposed to
be considered down.
Did I miss something?

Is this possible?  Yes.
It needs to be a feature of whatever monitor script you're using.
Are there any monitor scripts that do this now? Yes.
If you pass '-a' to fping.monitor it will report failure only if all hosts 
fail to respond.

Another approach is to add a threshold argument to the monitor script that 
causes it to allow that many hosts to be down before signaling an error.  I 
already have a modified version of fping.monitor that does essentially 
that, except it exits with different error codes depending upon the number 
of failures.  i.e. if I set the threshold to 1 then if more then one host 
fails to respond the script returns 255.  I suppose I could just commit 
those changes back to the main mon version, since they're entirely optional 
additional features, but for now that version is available at:
https://bugzilla.andrew.cmu.edu/cgi-bin/cvsweb.cgi/src/netsage/mon/mon.d/fping.monitor?rev=1.9content-type=text/x-cvsweb-markup

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: BUG: alertevery filtering fails because empty summary fails match with previous empty summary

2005-03-25 Thread David Nolan

--On Thursday, March 24, 2005 9:29 PM -0800 Michael Vogt 
[EMAIL PROTECTED] wrote:

I found that the cause is that one value was replaced by (NO SUMMARY)
if white space and the other was not.  Adding the line marked # FIX
around line 600 seems to correct the problem.

This is already fixed in mon-1.1.0pre1, available from 
http://sourceforge.net/projects/mon/ or 
ftp://ftp.kernel.org/pub/software/admin/mon/devel/

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Bug: mon.cf keyword error in period section not detected

2005-03-25 Thread David Nolan

--On Thursday, March 24, 2005 10:30 PM -0800 Michael Vogt 
[EMAIL PROTECTED] wrote:

Not sure if this has been reported.
It is not fixed in mon-1.0.0pre5.

Also already fixed in mon-1.1.0pre1.
-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Mon and non-failure logging

2005-03-25 Thread David Nolan

--On Wednesday, March 23, 2005 10:41 AM +0100 Greg [EMAIL PROTECTED] wrote:
Hi list,
I'm a new user of Mon and thanks to the doc and the easy configuration
file syntax I now have a working monitoring system for failure detection.
But now I want more :) Are there some couples of monitors/alerts for
usual monitoring, i.e. when detected values are in the range of
everything-is-ok but we want Mon to report the activity to a log file
(maybe rrd database for conveniency).
I know this is not the primary usage of Mon, or what I've understood of
it, but it would be useful and seems pretty simple to develop. So before
re-inventing the wheel for the 42th time I prefer asking (yes, I'm lazy).

There are a couple ways you could approach this problem.
You could have your monitor script exit with different error codes, and do 
different alerting based on the error code.  You can accomplish this with 
either 'alert exit=10-20 foo.alert', available in mon 0.99.2 and newer, 
which aplies to a single alert within a period, or via 'alertexitrange 
10-20', available in mon-1.1.0pre1 which applies to all alerts within a 
period.

Or you could use 'redistribute foo.alert', available in mon-1.1.0pre1 which 
causes the configured alert script to get called for every status update. 
It was designed to allow you to redistribute all status updates to remote 
systems (other mon hosts, or other monitoring systems).  But you could 
instead make the alert script log values into an rrd or some other 
operation.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Bug: mon.cf keyword error in period section not detected

2005-03-25 Thread David Nolan

--On Friday, March 25, 2005 9:39 AM -0800 Michael Vogt [EMAIL PROTECTED] 
wrote:

I'll have to take a look at that latest version.
How should I have known about these problems prior to noticing them
myself? I did not see them in the sourcefurge bug tracking.
Where else should I look before reporting problems?
Thanks for all your contributions to mon,
Sorry, there wasn't really any way for you to know about this.  These were 
probably bugs which I discovered much like you did, and fixed in my local 
Mon copy.  I don't know offhand whether they were in the patches that I 
sent to Jim back before we started working together directly, but those 
patches never got integrated anyway.

When Jim agreed to collaborate on further Mon development we re-activated 
what had been essentially a dead sourceforge project and I basically 
integrated all of my outstanding changes at once, carefully reading through 
all the changes at the time to make sure I wasn't breaking anything.  I 
also  went through and essentially cleared out the pending sourceforge bug 
queue at the time, because it hadn't been monitored or updated in a long 
time.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: monitoring email capability for monitor alerts

2005-03-22 Thread David Nolan

--On Tuesday, March 22, 2005 10:01 AM -0500 Andrew Siegel 
[EMAIL PROTECTED] wrote:

There are many things that can go wrong in the email delivery chain,
making it undependable for alerts of an urgent nature.  Better to use
qpage.alert to send TAP/IXO text messages to a pager.  Use a modem
directly connected to your mon host, and connected to a direct copper
phone line to further minimize things that can go wrong.
We use this technique as well as email-based paging, and most of the time
the modem-transmitted messages get to the pagers faster.
We do something very similar.  We have a custom alert script which first 
attempts to contact SkyTel's SNPP server over the internet, and if unable 
to contact it falls back to the SNPP server (qpage) on the local machine.

It turns out that you can do more interesting things via SkyTel's SNPP 
server directly then you can via TAP/IXO.  In particular I can enable 
two-way messaging, to allow our coverage people to reply to an alert from 
the pager.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: monitoring email capability for monitor alerts

2005-03-22 Thread David Nolan

--On Tuesday, March 22, 2005 8:27 AM -0800 Michael Vogt 
[EMAIL PROTECTED] wrote:

What holes are there in the setup where I just use mail.alert and
smsmtponitor from one monitor to the other?
The problem with monitoring email submission and reception is that you have 
no way to know if the mail got all the way to the final hop.  You can spend 
as much time as you want setting up a spiffy environment to verify that 
mail to address A gets delivered, but that doesn't tell you anything about 
address B.

You might find that your cellular providers provide a way to verify text 
message delivery, if you're using their web message submission forms.  But 
thats problematic because they're likely to redesign their web pages on a 
whim, so scripting the web interaction will be problematic.

SkyTel's SNPP server provides delivery confirmation information if you use 
two-way pagers.

Have you considered a fallback approach?  In our environment we always have 
two people on duty, and page the primary before the secondary.  Maybe a 
similar approach with different alert mechanisms would make sense.  One 
think you could try is to alert via email first, but then via dialing the 
user's cell phone directly with a modem.  Even if you don't put in a fancy 
text to speech system, if callerID works your admins can know Hey, Mon is 
calling, that must mean I missed an alert...  If you've got a better 
secondary alert mechanism, use that instead.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: monitoring email capability for monitor alerts

2005-03-22 Thread David Nolan

--On Tuesday, March 22, 2005 11:37 AM -0500 Ed Ravin [EMAIL PROTECTED] 
wrote:

I've got a newer version with the fallback paging that I'll release One
of These Days Real Soon Now.  Maybe around the same time David releases
his two-way Skytel code :-) :-).
While the whole alert script isn't really designed for use outside of CMU, 
the relevant portion of the code is:

eval {
local $SIG{ALRM} = sub {die Timeout during connection };
alarm $timeout;
my $snpp = Net::SNPP-new ($server,
   #Debug = 1,
  ) or die Unable to connect;
local $SIG{ALRM} = sub {die Timeout during communication };
alarm $timeout*2;

$snpp-_CALL('mon') || die Failed in _CALL;
$snpp-_HELP() || die Failed in _HELP;
my $help = $snpp-message;
if (grep /RPLY/, $help) {
  if ($message =~ /ALERT/) {
$snpp-_2WAY();
$snpp-_RPLY('[EMAIL PROTECTED]');
$snpp-_MCRE(ack Working on it);
$snpp-_MCRE(ack On my way);
$snpp-_MCRE(ack Will fix later);
$snpp-_MCRE(ack Ignoring);
$snpp-_MCRE(disable failing);
$snpp-_MCRE(disable-service);
$snpp-_MCRE(disable-group);
$snpp-_MCRE(enable failing);
$snpp-_MCRE(enable-service);
$snpp-_MCRE(enable-group);
  }
}

$snpp-send ( Pager = $pager, Message = $message);
my $status = $snpp-status;
if ($status != CMD_OK 
$status != CMD_2WAYOK 
$status != CMD_2WAYQUEUED) {
  die Failed to send to $pager: .$snpp-message;;
}
$snpp-quit || die Failed to quit;

$success = $server;
#print STDERR $server: success!\n;
 };
And you need to add one line to Net::SNPP to add the non-standard RPLY 
command:
sub _RPLY { shift-command(RPLY, @_)-response() == CMD_OK }


-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Upgrading to 1.1.0.

2005-03-09 Thread David Nolan

--On Monday, March 07, 2005 5:56 PM +0100 Marko Riedel 
[EMAIL PROTECTED] wrote:

Hello there,
we upgraded to 1.1.0. So far everything seems to be okay, but traps no
longer work. We did not change the code at the machines that send
traps, except to install the latest version (1) of Mon::Client.
Now traps that used to work cause the following output:
trap trap 1 from  grp=somegroup svc=DYNDNS, sta=255
failure for somegroup DYNDNS 1110213302 somehost DYNDNS OKA
As you can see the trap includes the output from the remote host,
which says that everything is okay. We did not chage the return codes
at all. How can a trap that used to work suddenly turn into a failure?
Thank you for your help.

Marko,
I'm trying to track this down to see if there is a bug.  The output you 
included is the syslog message thats sent on a trap being received.  The 
only problem I see in that message is that the source IP address of the 
trap isn't being filled in.

Are there any other log messages?  And can you provide a bit more detail on 
the exact failure behavior you see?  I assume that mon is just ignoring the 
trap completely.  Does it just ignore certain traps, or all of them?

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: How to succesfully control a virtual service if no real servers are defined?

2005-03-09 Thread David Nolan

--On Thursday, March 10, 2005 1:27 AM +0100 Sebastiaan Veldhuisen 
[EMAIL PROTECTED] wrote:

I already got  the scripts from Christopher de Marco (ipvs.alert and
ipvs.monitor) whih allows you to monitor if a virtual service has real
servers defined an take action, but i don't understand how to incorprate
them into mon.cf.

BIG CAVEAT:  I've never used LVS myself, so I'm taking some guesses here... 
Test this in a lab environment before deploying to a real world 
environment...

In your current model I think you want to add a third watch, which might 
look something like this:
watch webmail-lvs
 service http
   description virtual server for umail unsecure
   interval 30s
   monitor ipvs.monitor -P tcp -V x.x.x.18:80 ;;
   period wd {Sun-Sat}
  alert ipvs.alert -D -P tcp -V x.x.x.18:80
  alertafter 2

Thats close to right...  Note that there is no upalert defined here, 
because it would be nonsensical.  i.e. trying to bring up the virtual 
server when you just tested and determined that its up and running would be 
silly.  The upalerts on the per-host tests will take care of creating the 
virtual server, if i'm reading ipvs.alert correctly...

The problem is that i can't put bot real http servers
in the same host group, because i have to do different alert actions
(read: delete the specific server from the lvs). I saw that Christopher
had this same problem in an older thread
(http://www.mail-archive.com/mon@linux.kernel.org/msg01427.html). I've
contacted him, but he doens't respond to my mails. Anybody has a clue on
how to implement this?
Lets be precise here... you can't put both real servers in the same group 
without rewriting the alert script.  I think doing so would probably make 
some sense.  In particular you would want a modified version of ipvs.alert 
which took a port number as one option, and a read the list of real servers 
to enable/disable from the summary line.  Then you could group the hosts 
together in the way that makes the monitoring solution much more elegant, 
especially when you start moving beyond two servers in the pool.

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Trouble with -f mon option (deamon mode)

2005-02-02 Thread David Nolan

--On Wednesday, February 02, 2005 10:29 AM -0800 Michael Vogt 
[EMAIL PROTECTED] wrote:

This changes the currend directory to /.
I want to have files used by monitors be referenced relative to the
base directory.  It worked fine without using -f.
The hostgroup members I use for some custom monitors are actually
filenames.  I don't want to have to prepend the base directory.
What is the reason for the cd /?
What do I loose by not using -f when running from inittab?
Thanks for any help,
Michael Vogt

I suggest storing those files in the MON state directory, and using the 
MON_STATEDIR environment variable that is passed to monitor scripts to find 
the files.

The primary reason that daemons change their working directory is avoid 
having a running daemon have its working directory in a network mounted 
filesystem, or in a filesystem you might want to unmount.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Mon: Action upon success?

2005-01-14 Thread David Nolan

--On Friday, January 14, 2005 11:46 AM -0600 [EMAIL PROTECTED] 
wrote:

Does anyone here know if or how I can cause an action upon success in
mon.cf?
I'm working to have mon communicate with an in-house monitoring system.
The in-house monitoring system has it's own protocol and tools.
We have everything working except that I need to send a heartbeat
message when a test succeeds.
For those who are curious, this heartbeat message tells the in-house
monitor that this host and service is OK and when to expect the next
heartbeat.  If the next heartbeat does not come within 150% of the
heartbeat interval, an alarm goes off.
Thanks!
Try the current version of Mon from the sourceforge CVS repository.  There 
is a new config option, 'redistribute', which is configured on a service 
(not inside a period) and runs an alert script on every status update.  We 
use this for sending mon traps between two mon servers, but it could be 
used for your function just as easily.

(Though I just realized I need to add documentation for this option.  I 
thought I'd done that already...)

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Help with mon and process.monitor

2005-01-11 Thread David Nolan

--On Tuesday, January 11, 2005 2:07 PM +1100 Craig Reeson 
[EMAIL PROTECTED] wrote:

Now I am getting a SNMP timeout issue (using monshow.cgi). I have tried
increasing the timeout in process.monitor but it has made no difference.
However, if I just run 'process.monitor -c mycom 172.28.47.60' then it
works!
I assume your test runs of process.monitor are on the same machine as your 
mon server.  Are you logged in as the user that your mon server runs as? 
i.e. could it be something about your login environment thats allowing the 
script to work.  Can you post a snippet of your mon.cfg, showing the group 
definition and the service definition?

Also, you might want to try running this monitor script to verify that SNMP 
transactions with the target host are working.
https://bugzilla.andrew.cmu.edu/cgi-bin/cvsweb.cgi/~checkout~/src/netsage/mon/mon.d/host.monitor?rev=1.9

(Thats the script we use to verify that the host is responding to snmp, and 
test the load average.)

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: alerts functionality

2004-11-23 Thread David Nolan

--On Monday, November 22, 2004 1:45 PM -0500 Jim Trocki 
[EMAIL PROTECTED] wrote:

so total alerts sent is 1+2+3...+10?
is the latter correct? I've only tested it up to two hosts going down
consecutively :)
it's correct depending on how you configure mon. this is the default
behavior, but you can change it.
Also, it should be pointed out that this is entirely dependent on the 
behavior of the monitor script.  If the script outputs a different summary, 
then Mon will alert again (unless configured not to).  Most scripts output 
the list of failing hosts as the summary, but not all.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: mon logging setup problem

2004-11-15 Thread David Nolan

--On Monday, November 15, 2004 10:54 AM -0700 Shea Frederick 
[EMAIL PROTECTED] wrote:

Fixed that, but still not creating a log file.
logdir = /var/log/mon
dtlogfile = dtlog
Ah, I think you also need:
dtlogging = 1
(Forgot about that setting...)
-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Anyone going to LISA?

2004-11-15 Thread David Nolan
I was wondering if anyone else from the list is attending LISA '04 in 
Atlanta this week?  I'll be down there Tuesday night through Friday.  At 
last year's conference we had a very well attended Mon BOF.  If there's 
enough interest I could arrange to have a BOF session again...

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Example config for snmp traps?

2004-11-12 Thread David Nolan

--On Thursday, November 11, 2004 6:52 AM + Aled Treharne 
[EMAIL PROTECTED] wrote:

I've been using mon for some time now, but I've recently found a need to
have mon handle snmp traps generated by a new system. It's the end of a
nightshift, so I may be missing something stupidly and lgaringly
obvious, but I can't see any information in the docs or example files as
to how to set up a service to monitor snmp traps. Should I just do the
same as for mon traps?
Any help is most gracefully accepted.
Despite some misleading documentation, Mon currently has no native support 
for snmp traps.  In order to integrate snmp traps into Mon, you need 
software which can receive the traps and generate Mon events.

I'm attaching a message from this list from a while back where someone 
reports that they've been able to get the snmptrapd from the Net-SNMP 
package to integrate with Mon successfully.

LooperNG looks like it might be useful for translating SNMP traps into mon 
events, but doesn't currently have native support for sending stuff to Mon. 
(They already provide a mon alert script to generate send events to 
LooperNG, but no vice-versa.)

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!---BeginMessage---

If you are using the net-snmp package, it's fairly straightforward to
forward traps to mon with snmptrapd. Here is a script I use as the default
traphandle for snmptrapd; it forwards selected traps to a hostgroup/service
in mon.

(See attached file: snmp2montrap)





TORRESANI, 

Roberto   To: [EMAIL PROTECTED] 

[EMAIL PROTECTED]   cc:
  
unitn.it  Subject: SNMP traps  

Sent by:

[EMAIL PROTECTED]   
 
kernel.org  





01/08/02 04:32  

AM  









Hi all,
 can mon receive snmp traps?
As stated in the man page it seems that snmp support isn't implemented.

Is that right?
Is there a plan on when that will be available?
Anyone of you has in the meantime created some patch for mon to enable
snmp?

Roberto Torresani




snmp2montrap
Description: Binary data
---End Message---
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Generating mon trap for use as heartbeat

2004-11-09 Thread David Nolan

--On Tuesday, November 09, 2004 5:32 PM -0800 Konstantin 'Kastus' Shchuka 
[EMAIL PROTECTED] wrote:

On Tue, Nov 09, 2004 at 01:52:32PM -0800, Michael Vogt wrote:
I am planning to monitor some application servers on a datacenter with
a custom monitor plugin.  I want to have another monitor running at a
remote location to monitor the main monitor at the datacenter (and
vice-versa).  It looks like I should use mon traps in heartbeat mode.
How do I create the heartbeats.
Why can't you use mon.monitor?
It does not require any heartbeat, it just does what you are
asking for, monitor mon at the other location.
Both approaches are valid, and test different things.  I currently use 
mon.monitor to test my multiple mon servers from each other, but 
mon.monitor only verifies that the remote mon processes is processing 
client requests.  Adding a heartbeat service where the monitor script sends 
a trap is actually something I hadn't thought of before.

I like the idea.  It would verify that your mon server processes are 
successfully queing monitor processes.  I've actually had a failure mode in 
my system at one point where everything looked fine except some percentage 
of my mon scripts hadn't been run in days.  It turned out that my mon 
server was constantly throttling the number of running processes, due to 
its configuration, and was running *way* behind.  This approach would 
probably have detected that problem.

Michael Vogt wrote:
OK. I found remote.alert which sends a trap.  So I could modify this,
or maybe use it, as is, associated with a failalways.monitor to trigger
it.
Still not sure if I'd be badly reinventing a wheel. Is there a clean
proven way allready implemented?
Same thing for the configuration stab.  Is there a working example?
Using remote.alert as a base would work reasonably.  Or if you're willing 
to wait a day or so, I think I'm going to try to implement this for my 
system, and I'd be happy to post the script I use and the resulting config 
blocks as well.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Configuration file check

2004-11-08 Thread David Nolan

--On Monday, November 08, 2004 11:26 AM +0100 Andrea Carpani 
[EMAIL PROTECTED] wrote:

Is there a way to check the syntax of a mon.cf configuration file before
starting mon? Something like
perl -c file
No, but you can ask a running mon to parse a new config file for errors. 
If you're using mon.cgi, it provides a 'Test Mon Config File' option, or 
you can just run moncmd, with the arguments 'test config'.

This has always been good enough for my needs, since I always have mon 
running.  But if you require the feature you describe, I could add it 
fairly easily.

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: mon.cf in a sql database ?

2004-09-30 Thread David Nolan

--On Thursday, September 30, 2004 3:36 PM +0200 Brice Beauvillain 
[EMAIL PROTECTED] wrote:

Hello all,
Is it possible for mon to have the mon.cf file in a database ?
Thanks in advance,
There's no way to do that directly, but at CMU we wrote a system called 
NetSage which allows us to maintain the data in a database, and generate 
mon.cf files from that database.

And one of these days I swear I'm going to have the time to write some 
documentation so I can get it released...  Unfortunately I've been saying 
that for over a year.

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: mon.cf in a sql database ?

2004-09-30 Thread David Nolan

--On Thursday, September 30, 2004 5:58 PM -0400 Ed Ravin [EMAIL PROTECTED] 
wrote:

But you can do it indirectly.  Use the esyscmd macro in m4:
Ewww.. m4.  Uh, I mean, ooh, thats kinda neat.  :)
I wonder how well m4/mon handles it when the esyscmd program takes a long 
time to return, or just fails.  I suspect not well, at least during a 
config reload.

NetSage generates the config files on its own, and then runs a script that 
copies it into place, asks Mon to test the file, and assuming it passes the 
test tells mon to reload.  An m4 macro that dynamically generates the 
contents sounds like it would make the 'test and reload' operation both 
expensive, and unpredictable, since the file would be generated twice, and 
not guaranteed to be the same.

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Using Mon to modify DSN records

2004-09-20 Thread David Nolan

--On Monday, September 20, 2004 8:12 AM -0700 Nate Campi [EMAIL PROTECTED] 
wrote:

If you're using BIND it's generally best to use nsupdate since you're
not likely to introduce errors into the zone file this way.
There also is a Perl module that can handle this, Net::DNS::Update, which 
is part of the Net::DNS package.  We use it extensively to do thousands of 
DDNS updates a day to our zones.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Why no saved state for acks?

2004-09-16 Thread David Nolan

--On Thursday, September 16, 2004 11:02 AM -0700 Augie [EMAIL PROTECTED] 
wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Back in 2001 Ed Ravin said the following:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg00014.html
I'm thinking of coding a patch to mon to include the state of ACK'd
services in the saved state.  My problem is that if I ACK a service
and my mon server gets rebooted for some reason, it will start paging
people on alerts that were already acknowledged.
ACK state still does not seem to be kept in mon-1-0-0pre4, so my
question is; is anyone working on this already?
Try using the development version from the sourceforge CVS respository.  It 
has full scheduler state saving capability.

I think there is still a small bug with saving ACKs, where sometimes it 
looses the ACK during a reload, but I'd say it currently works 99% of the 
time...

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: DNS Monitor

2004-09-13 Thread David Nolan

--On Monday, September 13, 2004 10:26 AM -0300 Dalpi [EMAIL PROTECTED] 
wrote:

Hello all,
Has anyone faced the following error when using DNS monitor?
Zone 'nova.net': failed servers: x.x.x.x
Diagnostics:
  SOA query for nova.net from x.x.x.x failed question section
incomplete
I'm not being able to discover what is causing this failure. I've
captured the packets, but it seems that there is no error in the
query/answer.

I don't recall seeing that before offhand, but I'm not sure.
Run 'dig soa nova.net @x.x.x.x' and see if the result code/content makes 
sense.  If you're not sure what it should look like, please either post the 
output or send it to me personally if you're concerned about posting the 
data.

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: watch http on one server and several ports

2004-08-12 Thread David Nolan

--On Thursday, August 12, 2004 5:50 PM +0200 Antoine Reboul 
[EMAIL PROTECTED] wrote:

Hi,
/sorry for my pooor english i'm french .../
I have a high disponibility solution (lvs / mon / heartbeat)
My webservers host 2 web sites.
WebsiteA : adressIP:80
WebsiteB : adressIP:8099
I want that Mon watch each Ports so i wrote this :
--- part of mon.cf --
watch RealServer
service http
interval 30s
monitor http.monitor -p 80
period wd {Sun-Sat}
numalerts 1
alert lvs.alert
upalert lvs.alert
service http
interval 30s
monitor http.monitor -p 8099
period wd {Sun-Sat}
numalerts 1
alert lvs.alert
upalert lvs.alert
---

Your services need to have unique names, so 'service http' and 'service 
http-8099' or similar.  (And the fact that Mon doesn't notice this and 
complain is a bug.)

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: shutdown heartbeat

2004-07-08 Thread David Nolan

--On Thursday, July 08, 2004 8:30 AM +0200 mixo [EMAIL PROTECTED] wrote:
How can I shut down hearbeat from an alert script? This does not seem to
work:
# +
# !/bin/sh
/usr/lib/mon/alert.d/mail.alert $*
/etc/init.d/heartbeat stop
# +
The email sent out, but heartbeat is not stopped.
Are you running Mon as a user that can shutdown heartbeat?
If you run the script by hand, does it shut down heartbeat?
If so, figure out what is different between your environment and the one 
that Mon provides.  Most likely PATH is set differently, and the heartbeat 
script isn't setting PATH or fully specifying the path to some program.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: INSTALL updates

2004-07-08 Thread David Nolan

--On Wednesday, July 07, 2004 11:26 AM -0700 Eric Sorenson 
[EMAIL PROTECTED] wrote:

I got frustrated trying to show someone how to install mon, so I rewrote
chunks of the INSTALL doc to match reality. Apply or ignore as you see
fit.

Excellent, thanks.
Jim, I'll apply these.
-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: fping and root permissions

2004-07-06 Thread David Nolan

--On Monday, July 05, 2004 4:50 PM -0700 Joubin Moshrefzadeh 
[EMAIL PROTECTED] wrote:

If I run mon as a regular user and am using fping.monitor, I get the
error about needing root permissions or running fping with setuid root.
how do you do the setuid thing?
As root:
chmod +s /path/to/fping

i had the same problem before using perl's ping module and trying to do
an icmp ping...
This wouldn't be solvable without using suidperl, which introduces a whole 
slew of other issues.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Why doesn't _trap_timer get reset?

2004-06-28 Thread David Nolan

--On Monday, June 28, 2004 3:36 PM -0400 Jim Trocki [EMAIL PROTECTED] 
wrote:

On Mon, 28 Jun 2004, David Nolan wrote:
While it doesn't add any bugs, I don't believe it fixes any either.
it does indeed fix the bug where a received would not reset the
_trap_timer, preventing traptimeout from working at all. i've tested it
and it works properly now.

trap timeouts work already.  I get them on occasion.


Careful reading of the code makes it clear that _trap_timer is only ever
relevant after a timeout has already occurred.
It prevents a timeout alert from happening on every pass through the
code.
that is not true. _trap_timer is what counts down timeout counter in the
first place. it is what gauges whether or not a timeout has occurred.
once a timeout happens, as indicated when _trap_timer drops to zero or
below, is that do_alert is called and _trap_timer is then reset to the
value of traptimeout, and it starts counting down again.
what's supposed to prevent _trap_timer from hitting 0 in the first place
is the reception of a trap, and that is what was broken, and the patch i
posted fixes that.

Here's the code that actually decides whether or not to call 
handle_trap_timeout:

if ($sref-{_trap_timer} = 0  $tm - $sref-{_last_trap} 
   $sref-{traptimeout}) {
 $sref-{_trap_timer} = $sref-{traptimeout};
 handle_trap_timeout ($group, $service);
}
(This is from the CVS head version, the mon-1-0-pre* version uses
_last_uptrap, not _last_trap.  IIRC I decided that was a logic bug and 
fixed it in my code.  But thats a different issue.)

Note the second half of the if clause.  The if clause is confusing, so I'll 
re-order it and put some parens in:
if (($tm - $sref-{_last_trap})  $sref-{traptimeout})
($sref-{_trap_timer} = 0)) {

 $sref-{_trap_timer} = $sref-{traptimeout};
 handle_trap_timeout ($group, $service);
}
So there are two clauses.  One is testing whether we've recieved a trap
within the traptimeout window.  The other test is checking whether 
_trap_timer is set.  And since the only code that ever resets _trap_timer 
is inside the if statement, the only reason it wouldn't be less than zero 
is if a trap timeout has fired recently, or we're in the Mon just started 
recently state.

Again, I believe the only bug here is that the code is confusing. 
(Either its confusing both you and Tim, or its confusing me.  I believe its 
you. :)  Either _trap_timer should be made the only thing that controls 
timeouts (apply the patch to reset on each trap, and remove the second 
clause of the existing if statement) or it should be removed and replaced 
with the _last_traptimeout style code as I suggested earlier.  Either way 
is acceptable to me.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Why doesn't _trap_timer get reset?

2004-06-27 Thread David Nolan

--On Saturday, June 26, 2004 6:57 PM -0500 Tim Klein 
[EMAIL PROTECTED] wrote:


But what if the trap never times out?  It appears that the
value of _trap_timer just keeps getting decremented forever!
(There's a different conditional that keeps alerts from being
sent after it gets below zero.)  I can't find anything in the
code that could ever reset it.  Am I misunderstanding the
intended purpose of _trap_timer?

Tim,
Having just read this code, I'll agree that its a bit confusing.  But I 
don't believe this is a bug.

Essentially _trap_timer is used entirely as a way to prevent trap timeout 
alarms from happening on every pass through the code after the timeout is 
reached.  I.e. the actual check for the timeout is where it compares
($tm - $sref-{_last_trap}) to $sref-{traptimeout}.  And then when a 
trap timeout actually occurs, _trap_timer is reset so that no more timeout
alerts will be sent until that much time has passed again.

Does that help?
-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Problem with _trap_timer and long trap timeouts?

2004-06-27 Thread David Nolan

--On Sunday, June 27, 2004 1:45 PM -0500 Tim Klein 
[EMAIL PROTECTED] wrote:

Since we're on the topic of that _trap_timer thingie...
Upon launch or reset of mon, each trap's _trap_timer is set to
the value of its traptimeout.  After that, _trap_timer keeps
getting decremented as time progresses.  This seems to make
sense.
But, as I read the code, it's impossible for an alert to be
sent about a trap timeout unless _trap_timer has reached 0.
So let's say I have a trap whose timeout is 1 month.  I can't get
alerted about this trap until at least 1 month has passed since
the most recent launch or reset of 'mon', right?  So does that
mean the only way I'll ever know about a timeout of that trap is
if I manage to go a month without relaunching or resetting mon?

Yes, that is true.  But think about the reverse case, where you have 
received a trap within the last month, but Mon has restarted since then.

Basically, unless Mon is remembering full opstatus information between 
restarts, the timer must be initialized to the full timeout value at 
startup.  And the current Mon (both 0.99.2 and the 1.0-pre* versions)

Note that the code in the sourceforge CVS head contains support for saving 
and restoring the full opstatus.  However your question lead me to look at 
that code and notice that trap_timer isn't saved in the current code.  I'll 
fix that and commit it shortly.  (We don't use trap timeouts very much, so 
I'd never noticed before.)

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Are traps processed while scheduler is stopped?

2004-06-23 Thread David Nolan

--On Wednesday, June 23, 2004 10:44 AM -0500 Tim Klein 
[EMAIL PROTECTED] wrote:

If I pause the scheduler by doing moncmd stop, presumably
I won't get alerted about traps or trap timeouts.  But will
incoming traps still get noticed?  That is, will last_trap
still get updated as traps arrive, even though the scheduler
is stopped?
Actually with Mon 0.99.* and the 1.0.pre* code you *WILL* get alerts if you
get traps while the scheduler is stopped.  This is one of the bugs that are 
fixed in my code, i.e. in the CVS HEAD on sourceforge, soon to be released 
as Mon 1.1.*.

In that code alerts will not happen when traps are recieved while the 
scheduler is stopped, but the traps be processed, and their information will
show in the user interface.  (But any non-trap services will obviously not 
be running, and stale data will sit around.)

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Are traps processed while scheduler is stopped?

2004-06-23 Thread David Nolan

--On Wednesday, June 23, 2004 9:48 AM -0700 Jim Trocki 
[EMAIL PROTECTED] wrote:


ok well whether or not it's a bug may be debatable (traps aren't scheduled
so stopping the scheduler shouldn't affect them), but it sounds like the
behavior of your version may be more intuitive. maybe not. i don't know.
there seem to be a number of nuances here, and i originally intended
moncmd stop to control these first two:
related to scheduling loop:
-stopping monitors from being scheduled thus stopping the
 possibility of alerts from them
-stopping trap timeout alerts
not related to the scheduling loop:
-stopping alerts from traps
-delaying/stopping the processing of inbound traps
thoughts about the other two, or maybe more?

My thought here is that 'moncmd stop' is the emergency stop button.  i.e. 
Something is horribly broken, mon is paging everyone about everything. 
STOP!.  Thus all things directly monitored should stop being monitored, 
and abolutely no alerts should be generated for any reason.

Thats basically what I made it do.
-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


RE: mon 1.0.0pre2 and mon-client 1.0.0pre2 are in cvs on sourcefo rge

2004-06-21 Thread David Nolan

--On Monday, June 21, 2004 7:20 AM -0700 Jim Trocki [EMAIL PROTECTED] 
wrote:

On Mon, 21 Jun 2004, Peter Wirdemo (MO/EMW) wrote:
It must be a little odd, releasing a 1-0-0 version, for a software
nearly 10 years old...
not really odd at all--it's just the next release version. it could be
named anything at all. would 7 be a better version number?

How about Pi?
On a more serious note, Jim and I are working pretty closely now on the new 
Mon code via sourceforge.  All of my changes have been applied to the CVS 
repository, and once we've tested them a fair bit you can expect to see a 
mon-devel-1.1.X branch available.

(Of course, I'm about to head to conferences for two weeks.  Anyone on the 
list going to be at Usenix?)

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: david nolan's patches

2004-06-08 Thread David Nolan

--On Monday, June 07, 2004 1:32 PM -0700 Jim Trocki [EMAIL PROTECTED] 
wrote:

On Thu, 3 Jun 2004, David Nolan wrote:
(In fact, I may  have posted it to the list, but I can't recall
right now.  Time for some  email archeology.)
ahh, i apologize for my confusion. clearly my recollection was faulty,
and you now corrected it. thanks.
as far as maintaining the code in cvs with the intention of allowing
better cooperation amongst ourselves, i think it's a good idea. i don't
know if the sourceforge thing is what would be best. it does have some
advantages, such as the bug tracking functionality, mon is already
a registered project there and all (i haven't looked at that thing in
forever), but cvs tends to aggravate me. i guess i've been living with it
long enough to just accept it if that's all that sourceforge offers. i'd
prefer giving subversion a try.

Since I use CVS everyday for all the project I work on, CVS would be fine 
with me.  If you prefer another option, I'm sure we can work it out.

Ultimately, I'll be continuing to maintain the CMU custom version in our 
CVS tree, and importing changes from your version.  So I'll have to deal 
with two different repositories anyway.

Sourceforge  CVS would seem to be the easiest path, and as Scott points 
out it gives us some other features as well.  If you don't have any 
strenuous objections, why don't we go ahead and start using sourceforge? 
Upload the current stable version and the devel version, give me access, 
and I'll work on integrating my changes.  (My sourceforge userid is vitroth)

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: status of Mon development failures?

2004-06-03 Thread David Nolan

--On Thursday, June 03, 2004 10:30 AM -0700 Jim Trocki 
[EMAIL PROTECTED] wrote:

-rw-r--r--1 536536 179923 Apr 23 13:37 mon-0.99.3-41.tar.gz
It might help if you made announcements about new dev versions being 
available.

Have you started integrating any of the patches I've sent you yet?  If not, 
are you going to do so anytime soon, or should I just give up?  Multiple 
subscribers to the mon mailing list have asked me for a copy of my patched 
version of Mon.  I've given it to several and received nothing but positive 
feedback.  I've been resisting the urge to package up and release CMU-Mon 
as a fork, but maybe I should.  Part of the reason the NetSage 
monconfiguration system hasn't been released yet is that its really 
designed to take advantage of all the features I added.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: david nolan's patches

2004-06-03 Thread David Nolan

--On Thursday, June 03, 2004 10:52 AM -0700 Jim Trocki 
[EMAIL PROTECTED] wrote:

this is a matter of historical record which should be public. rather
than post his patched version to the mailing list for everyone to have a
gander at and do something with if they chose, he sent them only to me
(afaik), and since then i've been implicated as the reason why those
patches haven't been distributed to anyone else. i don't think that's
the right way to make progress, so i'm posting the diff between what he
sent to me and the closest release to it at the time, which is 0.99.2.
If you're looking to have an accurate historical record, you should at 
least post the long description I sent you of the patch.  As I recall, I 
itemized the entire patch, breaking it down into about 20 different 
changes, and for EVERY LINE in the patch I documented which changes it was 
a part of.  I spent a couple of hours doing that, so that you could pick 
and choose which portions of the patch you wanted to apply.

If you no longer have that information, I can dig it up.  (In fact, I may 
have posted it to the list, but I can't recall right now.  Time for some 
email archeology.)

By the way, Jim, I don't want you to feel like we're upset with you 
personally.  But the problem is that last spring the issue of new mon 
releases came up, and we had several people interested in doing joint 
development of the system.  But you spoke up and said you had some new 
versions for us to test, and you still wanted to be the primary maintainer. 
We all accepted that and trusted you to move the project forward.  But it 
has become increasingly clear to most of us that you just don't have the 
time to do more then maintenance releases to Mon, and maybe not even that. 
Many of us are willing to volunteer our time to help Mon continue to evolve 
and become a better system.  Please let us help you!

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: status of Mon development failures?

2004-06-03 Thread David Nolan

--On Thursday, June 03, 2004 11:02 AM -0700 Jim Trocki 
[EMAIL PROTECTED] wrote:

On Thu, 3 Jun 2004, Jim Trocki wrote:
yeah, something's funny there. i saved the message by using the pipe
raw text command in pine then ran uudeview on it, and that's what i
got. in the raw message it has no --ikeVEW9yuYc//A+q to terminate the
mime attachment.
i'll have a look at this one you just sent and stick it into the latest.
thanks.
wtf, the one you just sent has the same problem. maybe it's an mua
problem on your end? i had a look at the mail as delivered by the mta
on my end and it doesn't have an ending --BXVAT5kNtrzKuDFl. maybe that
line after the format STDOUT thing which begins with --- is messing
things up somehow, since there's a blank line after that and nothing else.
try gzipping the thing first then sending that as an attachment.
It looks like the Mon mailing list is playing games with removing lines 
starting with dashes, and following lines.  My last message had a signature 
that looked like the following, with a dash before my name on the first 
line of the sig, but the copy I got back from the list didn't have the 
dash.  I bet the signature stripping is being overzealous and hitting 
attachments as well.

David Nolan
Network Software Developer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: david nolan's patches

2004-06-03 Thread David Nolan

--On Thursday, June 03, 2004 2:25 PM -0400 Ed Ravin [EMAIL PROTECTED] 
wrote:

On Thu, Jun 03, 2004 at 10:52:03AM -0700, Jim Trocki wrote:
this is a matter of historical record which should be public. rather
than post his patched version to the mailing list for everyone to have a
gander at and do something with if they chose, he sent them only to me
Sounds like he wanted to respect your role as maintainer of Mon, and run
major changes by you before releasing them to anyone else.  The patches
probably arrived at a moment when you didn't have time to look at them,
allowing the misunderstanding and subsequent miscommunication to fester.
Bingo.  In fact, here's a quote from a message I sent to mon-l last June:
If anyone is interested in using my code, contact me and I'll point you to 
our CVS repository.  (Note: I'm *not* interested in forking mon, but if 
more people are testing my code, maybe Jim will be willing to integrate it 
into the mainline more quickly.)

I even got a request for access from Jim, and in the message I sent him I 
gave the URL for our CVS repository and said (among other things):
I'm intending to fix these issues before sending you a patch.  But, as I 
said, I'm waiting till you release something resembling my CVS version 2.0 
(which is the version I assigned to the last patch I sent to you), and then 
I'll send you another patch, or patches. I'm not going to send this URL to 
the mon list.  I don't want tons of people using this code, because I'm 
trying to discourage a mon fork.


These kinds of problems would be less likely to happen if we were using
Sourceforge or the like, since both the latest development version and
submitted patches would be publicly visible to all.
Any publicly available CVS repository would be great.  I'm not sure whether 
sourceforge is the best option, but ultimately I don't care as long as it 
works.

David Nolan
Network Software Developer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Frustrating hangs...

2004-06-01 Thread David Nolan

--On Tuesday, June 01, 2004 6:24 PM -0700 Ray Van Dolson 
[EMAIL PROTECTED] wrote:

We're using WebMonkey as a front-end to mon (latest development version)
and we're getting extremely slow performance as it queries the mon server.
Are any of your service tests outputting large amounts of text?  i.e. a 
hundred lines or so?  One of the modules that Mon uses (Text::Parsewords) 
is horribly inefficent, and basically becomes unusable with that much 
output.  (Large numbers of hosts in one hostgroup might exhibit the same 
behavior, if I remember right.)

Unfortunately, the only solution is to eliminate the usage of that perl 
module.  I've done that in my local version of Mon and the performance 
improvement was incredible.  There are other reasons to eliminate that 
module anyway (it can cause perl to segfault).  If you're interested in 
those patches, let me know.

-David
David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Downtime Log Bug?

2004-04-23 Thread David Nolan


--On Friday, April 23, 2004 10:49 AM +0200 Christian Hertel 
[EMAIL PROTECTED] wrote:

Our mon always tolds us that his Downtime Logfile starts at 1.1.1970.
Even if I erase the dt_logfile, the same error occurs after a few
minutes.
There is a bug, where blank lines from the dtlog are being output to the 
client, and the client is interpreting the timestamp as zero.

The fix is a single line change.  Search for this line in mon
 sock_write ($fh, $_ ) if (!/^#/);
and replace it with:
 sock_write ($fh, $_ ) if (!/^#/  !/^\s*$/);
(Yet another bug that I've had fixed in our copy of Mon for 1.5 years, but 
that I haven't submitted to Jim because he hasn't released the last set of 
patches I sent him.)

-David

David Nolan*[EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
 a herd of rogue emacs fsck your troff and vgrind your pathalias!
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Segmentation fault when running under mod_perl

2004-03-05 Thread David Nolan


--On Friday, March 05, 2004 11:34 AM +0100 Stephane Bortzmeyer 
[EMAIL PROTECTED] wrote:

I use Mon::Client and mod_perl to serve information from mon on the
Web. At the command line, everything is fine, but when running under
mod_perl (either Mason or Apache::Registry), I experience the infamous
Segmentation fault when calling things like list_opstatus. Other mon
commands work.
Are any of your monitor scripts returning particularly large summary/detail 
messages?  Or are you running a large number of tests?

There are some known bugs with Perl regexp parser that Mon occasionally 
runs into.  In particular, Mon 0.99.2 uses Text::ParseWords which is both 
horribly slow and has ridiculously complex regexps that sometimes cause 
Perl to segfault, especially if the input data is large.

I've patched my copy of Mon and Mon::Client to use split in the cases that 
are most likely to cause a problem.  If you're interested I can send you 
the patch.  It unfortunately requires a small change in the Mon client 
protocol, but any program that uses Mon::Client should work fine.  And the 
change for non Mon::Client programs is probably 3 lines.

IMO, this is one of the big reasons why we *really* need a new stable 
release of Mon.  I hope this is fixed in the development version, but I 
haven't personally tested it, as my Mon infrastructure is heavily dependent 
on the changes I've made to Mon, and Jim hasn't yet applied any of the 
patches I've sent him, as far as I know.

-David Nolan
Network Software Developer
Computing Services
Carnegie Mellon University
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


  1   2   >