Hi,
I'm new to Ganglia, but have developed network and system management software
in the past.
I'd like to extend Ganglia to provide more per-device information than it does
by default, and have done so with python plug-ins and a couple of reports in
/srv/www/htdocs/ganglia/graph.d/ I did this because the default network and
disk performance reports aggregate statistics. This may be useful sometimes,
but if there is a performance bottleneck I want to be able to "drill down" and
see:
- which if any devices were over-utilized
- what kind of workload they were being asked to do
The intent is not to leave these metrics running all the time, that would be
expensive, but to be able to turn them on when further analysis is needed. I
defined per-device metrics, but unfortunately they flood the host-level display
with graphs, making it almost unusable, and I couldn't see a way to just select
those graphs for a specific device without changing ganglia scripts.
- Can you give me any suggestions on how to make Ganglia do this? There is a
"context" variable in the .php scripts that seems to indicate the level in the
cluster hierarchy that we're interested in, could an additional "device" level
be added to this perhaps? I've tried this using additional context in URL such
as "d=sda" for /dev/sda device, haven't got it completely working yet.
- How do I filter out per-device metric graphs from the host-level context
while making the metrics available to reports? I've added a "context" field to
the metric descriptor returned by gmond, if this is set then I don't display
the metric unless the context matches. This prevents the "host" context from
getting flooded with graphs.
- Also, is there an easy way to take a set of device utilization statistics in
a host and aggregate them using MAX instead of AVERAGE to get a per-host
"max_utilization" metric? I saw __SUMMARY_INFO__ but I'm guessing that this is
averaged data and it's being averaged the wrong way. For example, I get average
disk utilization for /dev/sdb across the cluster. Frequently the average
utilization in a set of devices is misleading, unless the load is very well
balanced across them, in which case MAX and AVERAGE should be much closer
anyway.
- It was easy to add reports at the host level by changing hardcoded report
lists in 2 places, get_context.php and templates/default/host_view.tpl . Could
we change these to just look at list of graph.d/*_report.php and build list of
reports dynamically somehow? host_view.tpl does not have a place in it for PHP
code, right? Why not name the report after the context that it supports,
something like "graph.d/host_network_report.php", so the generic get_context
code could generate a report list by searching for graph.d/host_*_report.php ?
Some of these entries could be softlinks if a .php graphing script supports
multiple contexts.
thanks in advance for your advice,
-Ben England
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers