Lets keep this on the mailing list, so others can follow along...


--On Friday, July 22, 2005 2:28 PM +0300 Razvan Cojocaru <[EMAIL PROTECTED]> wrote:
I think they just stick around.
Runs fine for a few hours, but then more processes apear.
The actual monitoring still works, alerts are sent, but mon.cgi doesn't
work. The page keeps loading, and nothing is displayed. If i do a killall
mon, imediately the page loads fine, but complains about the daemon not
running.


I'm guessing most of them are shortlived, but it looks like you've got
two mon processes that have been around for a while.

What version of mon are you running?  There was a bug in one of the
pre versions where the fork to run an alert could end up not aborting
when done.  Resulting in two running mon processes.  I think thats
fixed in the current CVS version, and didn't exist in 0.99.2.

As i said, version is mon-0.99.2-r1 (as gentoo says)

Oops, I missed that before in your original message.

I just went back and read the 0.99.2 code and found the bug I was referring to *did* exist in that code. You've got an alert script which is failing to be executed. Mon may be sysloging a message indicating which one is the failure. I *highly* recommend you upgrade to the 1.1pre1 version, as it has many bug fixes. If you can't do that, here's the one line change you need to make. Search for a passage in mon that looks like:
       if (!exec @execargs) {
           syslog ('err', "could not exec alert $alert: $!");
           return undef;
       }

And change the 'return undef;' to 'exit(1);'





and i have another question.
the number of maxprocs is set to 200.

i'm monitoring a total of 102 services

grep service /etc/mon/mon.cf |wc -l
102

Is there a relation between those two numbers? about 30 of those 102 are
every minute, others are 3 minutes apart, others 5 minutes.


There is a relation, but its not what you're thinking it is. maxprocs is the maximum number of child process mon will run at the same time. If your monitor scripts take a long time to complete, you need the number to be high enough that all your scripts can be scheduled on a regular basis. But if your scripts are all shortlived (most are), then its just used as a way to prevent mon from spawning everything at once during a restart or other unusual event. 200 is almost certainly *way* higher then you need. From my overloaded mon server:
% grep service mon.cfg | wc
   704    2219   23043
% grep maxprocs mon.cfg
maxprocs = 20

(This server runs about 500K tests per day...)

-David

David Nolan                    <*>                    [EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
     a herd of rogue emacs fsck your troff and vgrind your pathalias!

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to