I just discovered the conversation about collectl and saw in a list archive
and thought I'd jump in.
When I first wrote collectl over 10 years ago the we felt we needed a more
powerful/flexible tool than sar to work with out High Performance customers
at HP.  For example, we needed to record a lot more types of information
than sar such as Infiniband and Lustre File System statistics.  How about
impi data such as temperatures or fan speeds?  Power consumption?  Anybody
remember Quadrics interconnect?  Collectl does that too, but there's a
whole lot more to collectl than just types of data it collects.

Rather than repeating what's on the website -
http://collectl.sourceforge.net/, you can read some of the features
yourselves.  Suffice it to say it runs on some of the worlds largest
clusters, sampling hundreds of data points every 10 seconds while using <
0.1% of a CPU.

But even more are 2 utilities that make it even more useful -
http://collectl-utils.sourceforge.net/.  colplot lets you produce high
resolution plot for dozens (or more) of nodes via a browser.  colmux allows
you to monitor hundreds of nodes in real-time from a single window, much
like top.  but unlike top which only shows top processes, colmux can do
that as well as show top-anything!  at least anything collectl can report.
 for example, if you had dozens of servers, each with dozens of disks, you
can use colmux to find the disks with the longest wait time.  or how about
the systems with the highest temps?

anyhow, see for yourself and check it out.

-mark

Reply via email to