On 07/12/2013 10:28 AM, Brenda J. Butler wrote:
> 
> 
> I don't know oswatcher, but based on your description the following
> would be usefule for you:
> 
> 
> munin (keeps a contstant sized database, which thins out as you look back
> in time).
10sec look and it looks like overkill but I will look at it more.

> 
> nagios
Definitely overkill. Using nagios for other things but what I'm after is
not monitoring as much as a tool to use after the monitoring alerted
that something is bad. At that point I want to know what did lead up to
all memory used up or what process that did consume all cpu/io since
once the alert happens it many time gets resolved with a big shotgun
like a reboot (like when they accidentally started 40 instances of a
java app on a server designed for 4) and we are left to tell what
happened without logs.


On 07/12/2013 01:36 PM, Jeffrey Moncrieff wrote:>
> You can also try zenoss.
>
Will check on that later

> 
> In both cases, if there is some test they don't already do, you can
> write your own and have them use it.
> 
Well, google did find https://github.com/stephenlang/scrutiny and that's
about the closest I seen to what I'm looking for but a bit to basic.

Since after all it's not that much to it I started writing something
that I will try out over the weekend. I know one challenge will be to be
able to actually collect anything when the system is crawling but
anything is better then what we have now which is nothing (besides 1
minute sar data which tend to stop before system dies).

/ps

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux

Reply via email to