check program FOO with path BAR

problem solved


On Tue, Sep 17, 2013 at 5:22 AM, Sean Penticoff <[email protected]>wrote:

>  Hi,
> Let me take a moment and try and describe what it is I'm trying to do in
> case my tack is all wrong.
> We have several systems that process data for users. The programs the
> users run all run from a shared space and run in user space at the users
> discretion.  I would like to use monit to alert when one of these processes
> is started and have it track the memory and cpu usage, further alerting on
> a condition where cpu or mem of that process exceeds a certain threshold
> (and possibly renicing it via some script)
> I've currently set up alerts like this:
> check process process1
>     matching "process1"
>     mode passive
>     group processing
>     if cpu is greater than 90% for 5 cycles then alert
>     if memory is greater than 90% for 5 cycles then alert
> check process process2
>     matching "process2"
>     mode passive
>     group processing
>     if cpu is greater than 90% for 5 cycles then alert
>     if memory is greater than 90% for 5 cycles then alert
> check process process3
>     matching "process3"
>     mode passive
>     group processing
>     if cpu is greater than 90% for 5 cycles then alert
>     if memory is greater than 90% for 5 cycles then alert
>
>
> ...and it goes on for another dozen or so processes
>
> This "works" but is not ideal
> what would be ideal is more along the lines of
> check process process1
>     matching "process1"
>     alert on statechange  (basically ignore the fact this process is not
> running but let me know when it starts and ends [i.e alert on state a
> change] and monitor it when it is running)
>     mode passive
>     group processing
>     if cpu is greater than 90% for 5 cycles then alert
>     if memory is greater than 90% for 5 cycles then alert
>
> Also we are using m/monit and every process on every machine that is NOT
> running shows up as a hit against overall health
> i.e.
> under the host status:
> Status  10 out of 27 services are available
>
> and on the main dashboard:
>
>  ×[Sep 16 2013 15:59:47] Host 
> *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>
> * reported a problem with *process1***: process is not running
>  ×[Sep 16 2013 15:59:44] Host 
> *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>
> * reported a problem with *process2*: process is not running
>  ×[Sep 16 2013 15:59:40] Host 
> *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>
> * reported a problem with *process3*: process is not running
>  ×[Sep 16 2013 15:59:35] Host 
> *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>
> * reported a problem with *process4*: process is not running
> multiplied by 20+ hosts
> you get the idea.
>
> The fact that the process isn't running is never a problem and I would
> like to reflect that somehow and also be able to have some insight into
> whats running where.
>
> Another thing I would really like to be able to do is pass args in the
> alert emails
>
> i.e. when the command process1 -t foo -o bar -cfg process1.cfg -v -X -s
> is run I'd be tickled if I could get  "-t foo -o bar -cfg process1.cfg -v
> -X -s"  (or even the entire content of monit procmatch) into the alert
> somehow
>
> I've only had this up and running for about a month and monit has saved my
> bacon on filesystem checks and dead services several times. Just wanting to
> do a bit more than the system side of things with it.
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>



-- 
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail
/\
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to